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1  Introduction 


In  the  past  decade,  a  powerful  theory  for  designing  robust  control  systems  has  emerged. 
Starting  with  a  model,  and  a  description  of  the  uncertainty  (structured,  parametric  etc.), 
a  controller  can  be  designed  to  meet  a  variety  of  performance  specifications.  This  develop¬ 
ment,  however,  has  not  been  accompanied  by  a  parallel  development  in  system  identification 
methods  by  which  a  plant  model  and  a  description  of  uncertainty  is  provided.  In  an  attempt 
to  bridge  this  gap,  a  new  area  of  research  in  robust  identification  has  emerged  in  the  last 
few  years.  This  research  is  motivated  (in  part)  by  the  following: 

1.  It  is  evident  that  a  “good”  controller  cannot  be  designed  based  entirely  on  a  model 
without  a  description  of  plant  uncertainty  [15,  16,  8].  Current  identification  schemes 
do  not  provide  information  about  plant  uncertainty  that  is  usable  by  current  robust 
control  techniques  [36]. 

2.  The  failure  of  most  adaptive  systems  is  a  consequence  of  the  failure  of  the  identification 
scheme  within  the  adaptive  controller.  This  failure  can  be  described  either  in  terms 
of  parameter  convergence  (a  traditional  and  possibly  inappropriate  description),  or  in 
terms  of  plant  uncertainty  [2,  44]. 

3.  Much  of  the  research  done  up  to  now  on  system  identification  has  assumed  that  the 
noise  process  is  stochastic,  e.g.,  filtered  white  noise,  with  stationarity  being  an  impor¬ 
tant  side  assumption.  A  lot  of  attention  has  been  paid  to  showing  convergence,  as  well 
as  to  deriving  bounds  on  confidence  intervals,  all  asymptotically.  Not  much  effort  was 
put  into  problems  with  finite  data  and  possibly  nonstationary  noise. 

4.  The  status  of  spectral  estimation  remains  as  in  Jenkins  and  Watts  [27].  For  nonsta¬ 
tionary  noise,  much  of  that  theory  does  not  yield  satisfactory  results. 

5.  There  was  very  little  understanding  of  the  fundamental  limitations  of  system  identifi¬ 
cation  in  the  presence  of  different  classes  of  noise,  and  when  the  objective  is  to  reduce 
the  plant  uncertainty  given  only  finite  data. 

6.  The  limitations  of  controller  design  when  only  finite  corrupted  data  is  available  are 
not  well  understood.  In  that  sense,  the  available  tools  from  robust  control  are  not  well 
connected  with  experiments,  and  the  assumptions  underlying  the  existing  paradigms 
may  be  somewhat  unrealistic. 
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As  a  result,  there  has  been  increasing  interest,  among  the  control  and  identification 
communities,  in  the  problem  of  identifying  plants  for  control  purposes.  This  generally  means 
that  the  identified  model  should  approximate  the  plant  as  it  operates  on  a  rich  class  of  signals, 
namely  signals  with  bounded  norm,  since  this  allows  for  the  immediate  use  of  robust  control 
tools  for  designing  controllers.  This  problem  is  of  special  importance  when  the  data  are 
corrupted  with  bounded  noise.  The  case  where  the  objective  is  to  optimize  prediction  for 
a  fixed  input  was  analyzed  by  many  researchers  [18.  34,  37,  38,  39,  42].  The  problem  is 
more  interesting  when  the  objective  is  to  approximate  the  original  system  as  an  operator,  a 
problem  extensively  discussed  in  [55].  For  linear  time  invariant  plants,  such  approximation 
can  be  achieved  by  uniformly  approximating  the  frequency  response  (in  the  "Hoc-norm)  or 
the  impulse  response  (in  the  l\  norm).  In  identification,  it  was  shown  that  robustly 
convergent  algorithms  can  be  furnished,  when  the  available  data  is  in  the  form  of  a  corrupted 
frequency  response,  at  a  set  of  points  dense  on  the  unit  circle  [22,  23,  24,  20,  21].  When  the 
topology  is  induced  by  the  i\  norm,  a  complete  study  of  asymptotic  identification  was  given 
in  our  past  work  [52,  53,  54]  for  arbitrary  inputs,  and  the  question  of  optimal  input  design 
was  addressed  as  well.  Related  work  on  this  problem  was  also  reported  in  [19,  26,  31,  35, 
40,  41,  48]. 

Another  issue  of  importance  in  the  context  of  worst-case  identification  is  complexity.  It 
turns  out  that  it  is  generally  much  harder  to  devise  experiments  that  can  guarantee  small 
worst-case  errors  in  the  presence  of  bounded  noise.  This  problem  has  been  extensively 
analyzed  in  our  work  [11]  and  elsewhere  [43.  33]. 

It  is  important  to  caution  at  this  point  regarding  the  meaning  of  “worst-case”  errors, 
that  the  terminology  “worst-case”  does  not  mean  that  one  can  furnish  guarantees  on  the 
worst-case  error  with  respect  to  the  actual  plant.  Clearly,  any  result  we  obtain  is  a  function 
of  prior  assumptions  (which  are  not  verifiable  in  general),  and  thus  the  results  hold  only  when 
these  assumptions  are  valid.  This  is  no  different  from  the  traditional  stochastic  approach  for 
system  identification.  One  cannot  derive  guarantees  about  the  actual  plant,  from  only  finite 
data,  without  additional  assumptions  about  the  set  of  possible  plants,  and  any  methodology 
will  be  subject  to  this  limitation. 

Even  with  this  recent  development,  system  identification  and  robust  control  remain  sep¬ 
arate  fields.  The  estimates  of  uncertainty  obtained  from  the  above  methods  tend  to  be  quite 
conservative,  which  renders  them  useless  for  robust  control  methods.  A  framework  unify- 
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ing  the  controller  design  problem  has  to  be  iterative  in  nature,  and  robust  control  methods 
should  play  a  role  in  the  selection  of  experiments  for  the  next  iteration.  In  this  sense,  the 
hypothesized  model  structures  should  include  a  description  of  the  uncertainty  (that  will  not 
be  identified).  Once  such  a  model  is  described,  a  controller  can  be  designed  based  on  this 
description.  The  signals  used  to  test  this  controller  should  provide  further  useful  data  to 
tune  this  model  further  and  obtain  better  performance  at  the  next  iteration.  Needless  to  say, 
a  computable  theory  of  this  kind  is  still  not  available.  Iterative  identification/ control  meth¬ 
ods  have  already  been  discussed  in  the  literature  (see  [1,  30,  46,  56]  for  example).  However 
current  approaches  are  based  on  simply  attempting  to  identify  the  system  in  closed  loop, 
refining  the  control  design  and  the  identified  model  as  the  iterations  proceed.  As  such  these 
methods  merely  aim  towards  a  particular  closed  loop  model  (for  a  specific  controller).  Even 
though  these  methods  depart  from  the  traditional  system  identification  approach,  they  still 
do  not  provide  a  framework  in  which  information  from  a  previous  iteration  reduce  the  plant 
uncertainty  for  the  next  iteration.  What  is  lacking  is  a  general  and  systematic  means  to  ex¬ 
ploit  powerful  robust  control  design  and  set  membership  identification  techniques,  and  hence 
provide  an  identification  and  control  design  methodology  with  firm  performance  guarantees. 

Our  research  addresses  the  general  controller  design  problem  starting  from  finite  cor¬ 
rupted  data  and  some  prior  information.  On  one  hand,  we  will  study  the  identification 
problem  in  the  presence  of  deterministic/stochastic  noise,  and  study  the  fundamental  lim¬ 
itations  and  capabilities  of  identification  in  such  a  setup.  In  particular,  we  will  study  the 
problem  of  translating  this  coarse  description  of  the  experimental  setup,  into  a  precise  de¬ 
scription  of  a  plant  and  uncertainty.  On  the  other  hand,  we  will  develop  robust  control 
techniques  to  handle  the  most  general  robust  performance  problem.  We  will  show  how  these 
can  be  integrated  into  one  framework  in  which  identification  and  control  are  done  in  an 
iterative  fashion.  While  this  will  provide  a  procedure  for  a  systematic  design,  it  is  still  far 
from  feasible  with  current  methods,  and  our  research  will  concentrate  on  providing  the  tools 
for  implementing  it. 

1.1  Summary  of  Past  Accomplishments 

Our  past  research  has  concentrated  on  developing  a  theoretical  foundation  for  system  iden¬ 
tification  in  the  presence  of  deterministic  noise.  In  particular,  the  work  of  Tse  et  al  [10,  52, 
53,  54]  allows  for  the  analysis  of  large  classes  of  systems,  including  nonlinear  fading  memory 
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systems.  The  study  is  done  in  two  steps.  The  first  step  is  concerned  with  obtaining  tight 
upper  and  lower  bounds  on  the  optimal  achievable  error,  for  a  given  fixed  experiment.  The 
second  step  is  to  study  these  bounds  and  characterize  the  inputs  that  will  minimize  them. 
The  upper  and  lower  bounds  are  obtained  under  some  mild  topological  assumptions  on  the 
model  set,  and  for  any  fixed  experiment,  through  the  diameter  of  the  worst-case  uncertainty 
set.  a  concept  borrowed  from  Information  Based  Complexity  [49,  50]. 

Using  this  formulation,  we  have  studied  in  detail  several  model  sets  containing  linear  time 
invariant  stable  systems.  We  also  analyzed  the  sample  complexity  in  the  case  of  unknown 
bounded  noise. 

Our  research  in  robust  control  has  concentrated  on  developing  computational  methods 
for  solving  the  t\  robust  control  problem.  These  methods  can  be  extended  to  incorporate 
additional  frequency-domain  and  time-domain  constraints  that  are  not  directly  captured  by 
standard  theory.  The  methods  provide  bounds  on  the  optimal  achievable  performance  and 
give  information  about  the  structure  of  the  optimal  controller.  This  work  has  formed  the 
basis  of  some  software  tools  that  we  have  developed  for  designing  control  systems  in  the 
presence  of  mixed  objectives.  Finally,  in  a  related  effort,  some  major  open  problems  in 
robust  control  have  been  addressed  using  the  theory  of  computational  complexity. 

A  last  area  of  research  has  dealt  with  the  foundations  of  learning  theory,  as  developed  by 
computer  scientists  and  statisticians,  with  the  objective  of  linking  it  to  the  basic  problems 
of  learning  that  arise  in  control  theory. 

Part  of  our  effort  has  been  channelled  towards  education.  In  that  regard,  we  have  written 
a  textbook  explaining  the  current  robust  control  paradigm  emphasizing  computations.  The 
book  is  titled:  Control  of  Uncertain  Systems:  A  Linear  Programming  approach  (by  Dahleh 
and  Diaz-Bobillo).  In  the  book,  we  present  a  unifying  theory  for  robust  control  that  is  quite 
accessible  to  engineers  at  all  levels.  This  will  help  in  bridging  the  existing  gap  between 
theory  and  applications. 

2  Details  of  Past  Research 

2.1  Robust  Identification 

We  consider  a  framework  for  system  identification  which  is  meant  to  provide  not  only  a 
nominal  model  for  an  unknown  plant,  but  also  some  hard  guarantees  on  the  distance  of  the 
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true  model  from  the  nominal.  For  clarity  of  exposition,  the  discussion  that  follows  is  based 
on  a  concrete  set  of  assumptions.  However,  the  framework  is  more  general  and  we  discuss 
alternative  settings  as  we  proceed. 

We  start  with  a  model  set  M  which  is  meant  to  capture  any  prior  information  we  might 
have  on  the  unknown  system  to  be  identified.  For  example,  M  could  be  the  set  of  all  stable 
linear  time  invariant  systems,  or  the  set  of  all  LTI  systems  with  a  finite  impulse  response  of 
length  N.  Let  U  be  the  set  of  all  inputs  u(-)  such  that  \u(t)\  <  1  for  all  t.  Finally,  let  V 
be  the  set  of  all  output  disturbances  d(-)  that  satisfy  |d(£)|  <  5  for  all  t,  where  5  is  a  given 
constant.  We  then  consider  the  following  sequence  of  events.  We  choose  some  input  function 
u  e  U  and  apply  it  to  the  unknown  system  h  e  M.  We  observe  a  noise-corrupted  output  of 
the  system,  of  the  form 

y  =  u  *  h  +  d,  (1) 

where  d  G  V.  Based  on  the  observation  y  and  our  knowledge  of  u,  A4,  and  V,  we  can  form 
the  uncertainty  set  Q(y)  which  is  the  set  of  all  models  that  are  possible,  given  the  information 
that  we  have: 

f }(y)  =  {h  €  M  |  3d  e  V  such  that  y  =  u  *  h  +  d}.  (2) 

We  might  choose  an  element  h  of  f l(y)  and  call  it  the  estimate  of  h.  The  worst  case  error  is 

E(y)  =  sup{||/i  -h\\\he  ftfo)}, 

where  ||  •  ||  is  a  norm  on  the  set  of  all  plants.  In  fact,  no  matter  how  we  choose  h,  we  have 

^diam(Q(y))  <  E(u)  <  diam(f2(y)). 
z 

Thus,  instead  of  focusing  on  any  particular  estimate,  we  might  concentrate  on  the  diameter 
of  the  uncertainty  set  Q(y)  and  view  it  as  a  measure  of  the  identification  accuracy  we  have 
achieved. 

In  earlier  research  [52,  53],  we  have  provided  a  conceptual  foundation  for  the  above 
outlined  approach.  We  have  proved  that  in  the  limit  of  very  long  experiments,  the  best 
achievable  diameter  infueW  E(u)  is  either  equal  to  25  or  it  is  infinite.  Which  of  the  two  will 
be  the  case  depends  on  the  underlying  model  set,  that  is,  on  the  amount  of  prior  available 
information.  This  allows  us  to  say  that  some  model  sets  are  learnable  and  some  are  not, 
depending  on  the  value  of  inf„eW  E(u).  We  have  also  shown  that  a  model  set  is  learnable  if 


7 


and  only  if.  under  our  experimental  setup,  we  can  distinguish  between  stable  and  unstable 
plants. 

In  another  study  [11.  51],  we  focused  on  worst-case  identification,  under  the  £\  norm 
error  criterion,  of  plants  with  a  finite  impulse  response.  Although,  this  is  a  learnable  model 
set  (the  worst-case  error  can  be  made  as  small  as  25),  we  have  proved  that  the  experiment 
length  must  be  an  exponential  function  of  the  length  of  the  impulse  response,  even  if  we  are 
willing  to  settle  for  an  error  which  is  within  a  constant  factor  of  5.  This  results  suggests 
that  the  standard  assumptions  used  in  worst-case  identification  are  too  conservative  to  be 
practical,  and  that  some  probabilistic  aspects  should  be  introduced. 

Our  most  recent  work  in  this  area  [5],  has  used  an  alternative  and  possibly  more  realistic 
model  of  the  noise  sequence  d ,  commonly  referred  to  as  “deterministic  white  noise.”  With  this 
model,  the  set  V  of  admissible  disturbances  is  constrained  further  by  requiring  the  sample 
autocorrelation  of  the  disturbance  sequence  d  to  be  low,  which  provides  a  deterministic 
counterpart  of  white  noise.  Our  work  has  provided  upper  and  lower  bounds  on  the  worst- 
case  diamater  of  the  uncertainty  set,  as  well  as  exponential  lower  bounds  on  the  length  of 
the  experiments  required  to  obtain  a  small  enough  diameter. 

2.2  Robust  Control 

We  summarize  below  our  research  accomplishments  in  the  area  of  robust  control. 

1.  Computation  of  i\  Optimal  Solutions: 

The  contributions  in  this  regard  are  marked  by  the  introduction  of  the  Delay  Augmenta¬ 
tion  Algorithm  for  solving  nonsquare  problems  (e.g.,  problems  with  more  regulated  variables 
than  actuators)  [13,  14].  This  algorithm  is  based  on  squaring  the  system  by  introducing  ficti¬ 
tious  delayed  inputs  and  outputs.  The  problem  is  solved  iteratively  as  the  number  of  delays 
increase.  At  each  iteration,  a  square  £\  problem  is  solved  (the  solution  of  which  is  known 
exactly).  The  main  features  of  this  algorithm  are  that:  (1)  at  each  iteration  it  gives  upper 
and  lower  bounds  of  the  optimal  objective  function  which  are  convergent;  (2)  it  provides 
information  about  the  structure  of  the  controller;  (3)  it  does  not  cause  order  inflation  (it  is 
not  based  on  FIR  approximations);  (4)  it  involves  solving  one  linear  program  iteratively.  In 
many  cases,  the  exact  solution  for  nonsquare  problems  is  provided. 

For  implementation  purposes,  all  computations  are  performed  using  matrix  algebra,  of¬ 
ten  exploiting  the  structure  of  Toeplitz  matrices  resulting  from  convolution  operators.  An 
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example  of  that  is  the  development  of  methods  for  computing  directions  of  zeros  with  mul¬ 
tiplicity  using  Toeplitz  matrix  manipulation,  without  ever  computing  the  Smith-McMillan 
Decomposition. 

A  major  part  of  this  research  that  parallels  our  research  in  computation  has  been  the 
development  of  software.  Using  this  software,  we  have  studied  a  variety  of  benchmark  prob¬ 
lems  (e.g.,  the  X29  Aircraft,  an  earth-observing  system  (EOS-AM),  a  flexible  beam,  a  high 
purity  distillation  column).  The  following  are  the  main  new  features  of  the  software. 

1.  Using  Delay  Augmentation  as  the  main  core  for  nonsquare  problems. 

2.  Characterizing  feasible  subspaces  by  zeros.  The  computations  involve  lower  triangular 
block- Toeplitz  matrices. 

3.  All  necessary  computations  are  in  state-space. 

4.  Optimization  involves  solving  linear  programs. 

2.  Robustness  Analysis  and  Synthesis 

This  is  concerned  with  the  development  of  a  computational  theory  to  address  directly 
uncertain  plants.  The  uncertainty  is  structured  in  nature,  possibly  time  varying,  but  non- 
parametric.  In  this  regard,  we  have  built  on  the  results  in  [9,  ?]  to  come  up  with  simple 
conditions  for  robust  analysis  in  the  presence  of  structured  uncertainty  [8].  This  can  be 
readily  generalized  to  MIMO  perturbation  blocks.  The  conditions  are  stated  in  terms  of  the 
spectral  radius  of  a  matrix  constructed  from  computing  the  i\  norms  of  certain  closed  loop 
maps.  We  have  also  analyzed  the  case  of  time-invariant  perturbations  when  stability  is 
required  and  have  shown  that  the  natural  conditions  are  in  the  frequency  domain  (coincide 
with  the  standard  /x  results). 

Since  the  spectral  radius  of  a  positive  matrix  can  be  computed  by  minimizing  a  scaled  l\ 
norm,  synthesis  for  structured  uncertainty  problems  involves  iterations  between  solving  an 
ii  problem  and  finding  optimal  scales  for  the  uncertainty.  We  have  analyzed  this  algorithm 
in  detail,  and  have  shown  its  limitations.  We  have  also  proposed  an  alternative  algorithm 
based  on  sensitivity  analysis  of  the  linear  programming  solution  of  the  i\  problem  [45]. 

Finally,  we  have  looked  at  some  of  the  basic  problems  of  robust  control,  using  the  tools 
of  the  theory  of  computational  complexity.  For  example,  suppose  that  we  are  given  interval 
matrices  A,  B ,  and  C  (that  is,  matrices  with  each  entry  being  a  range  of  possible  values). 
A  most  basic  problem  in  robust  control  is  to  find  a  feedback  gain  matrix  K  such  that 
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.4  +  BI\C  is  stable  for  all  possible  matrices  A.  B ,  C  whose  entries  are  within  the  allowed 
ranges.  We  have  shown  that  a  version  of  this  problem,  as  well  as  some  related  problems 
in  decentralized  control  and  simultaneous  stabilization,  are  NP-hard  [4],  This  means  that 
under  the  prevailing  conjectures  in  computational  complexity  theory,  these  problems  are  not 
efficiently  solvable.  Results  of  this  type  are  useful  in  determining  the  fundamental  limits 
of  what  types  of  solutions  to  such  problems  are  possible,  and  also  determine  what  kind  of 
research  can  be  meaningfully  pursued. 

3.  Writing  a  Book  on  Robust  Control 

The  book  titled:  Control  of  Uncertain  Systems:  A  Linear  Programming  Approach  written 
by  Dahleh  and  Diaz-Bobillo  presents  a  unified  treatment  of  the  theory  of  robust  control  design 
with  emphasis  on  computational  methods.  It  can  serve  as  a  starting  point  for  researchers  in 
the  field  as  well  as  a  textbook  for  a  graduate  class  in  control.  In  our  opinion,  this  is  the  only 
book  available  that  gives  a  comprehensive  treatment  of  H2,  Boo  and  i\  methods  integrated 
in  a  robust  performance  framework,  with  emphasis  on  computations. 

2.3  Identification  for  Controller  Design 

The  traditional  route  to  controller  design  has  been  to  first  perform  some  system  identi¬ 
fication,  so  as  to  abstract  a  mathematical  model  from  the  physical  process.  This  model  is 
then  used  as  the  basis  for  the  design  of  the  controller.  In  some  instances  the  controller  may 
not  have  the  desired  properties  when  implemented  on  the  actual  system.  In  these  cases,  one 
has  to  go  back  and  alter  the  design  in  some  fashion.  However  it  is  often  not  clear  whether 
the  fault  lies  in  the  controller  design  process,  the  system  identification  procedure,  or  stems 
from  the  fact  that  one  has  not  taken  enough  data  to  properly  identify  the  plant,  or  has  set 
performance  specifications  that  are  simply  too  stringent  to  be  met. 

The  problems  arise  from  the  fact  that  this  traditional  route  is  rather  ad-hoc,  so  that  when 
it  fails  one  does  not  know  where  to  lay  the  blame.  We  would  like  to  develop  a  framework  for 
addressing  these  issues  in  a  systematic  and  quantifiable  fashion.  In  order  to  do  so,  we  first 
note  that  our  ultimate  goal  is  to  design  a  controller  which  meets  the  required  performance 
specifications  on  the  actual  plant.  With  this  observation  we  see  that  there  is  no  necessity 
to  artificially  split  this  design  process  into  an  identification  procedure,  and  a  control  law 
design.  Moreover  we  believe  that  by  so  doing  one  throws  away  a  lot  of  potentially  useful 
information. 
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1.  Problem  Definition 

The  basic  control  problem  we  consider  may  be  stated  as  follows:  Given  some  prior 
information  about  the  process  and  a  set  of  finite  data,  design  a  feedback  controller  that  meets 
the  given  performance  specifications.  We  propose  to  develop  a  framework  for  addressing 
this  problem  by  considering  an  integrated  system  identification  and  design  process.  The 
resulting  procedure  should  allow  us  to  incorporate  an  array  of  system  identification  and 
controller  design  methodologies,  so  that  we  may  adapt  our  methods  to  make  use  of  the 
latest  tools.  We  will  require  that  the  design  procedure  be  systematic,  so  that  at  each  stage 
the  next  course  of  action  is  clear,  and  the  procedure  terminates  with  a  successful  controller 
design  or  the  conclusion  that  the  performance  specifications  cannot  be  met  (subject  to  our 
prejudice) . 

3.  An  Iterative  Formulation 

Rather  than  assuming  that  the  prior  information  is  true,  it  is  more  natural  to  think  of 
it  as  a  parametrization  of  model  structures  from  which  we  desire  to  explain  the  data,  i.e.,  a 
description  of  our  prejudice.  In  this  sense,  prior  information  can  itself  be  invalidated  by  the 
data.  This  distinction  is  crucial  since  such  information  is  generally  derived  from  simplified 
models  of  the  process,  and  hence  is  not  verifiable.  Once  a  set  of  finite  data  is  acquired,  a 
set  of  models  that  are  consistent  with  the  data  and  the  model  structure  parametrization 
(prior  information)  is  defined.  This  set  contains  all  models  that  are  not  falsified  by  the  data. 
Roughly  speaking,  system  identification  picks  a  most  powerful  unfalsified  model  where  most 
powerful  is  defined  depending  on  the  objective  in  mind.  In  this  case  it  is  finding  a  controller 
that  delivers  a  given  performance  level.  We  also  note  that  the  process  of  finding  such  a 
model,  and  a  controller,  is  iterative  in  nature  as  more  sets  of  data  are  acquired. 

It  is  evident  that  any  iterative  scheme  will  generally  be  based  on  reducing  the  set  of  unfal¬ 
sified  plants  until  a  controller  based  on  the  remaining  elements  can  deliver  the  performance 
when  connected  with  the  actual  process,  or  a  decision  is  made  to  enlarge  the  parametrized 
set  of  models  and/or  change  the  performance  requirement.  We  propose  a  general  scheme 
that  is  based  on  efficiently  eliminating  models  from  the  set  of  unfalsified  models.  Of  course, 
the  acquisition  of  more  data  systematically  reduces  this  set,  although  the  efficiency  of  this 
depends  on  the  data  set  itself.  On  the  other  hand,  an  unfalsified  model  is  invalidated  if 
there  exists  a  controller  that  delivers  the  required  performance  for  this  model  and  the  same 
controller  does  not  meet  the  performance  with  the  actual  process.  Given  our  prejudice,  this 
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model  is  unacceptable.  Finally,  an  unfalsified  model  to  which  no  controller  can  be  designed 
to  meet  the  performance  specifications  is  discarded.  In  this  way,  if  all  models  are  eliminated, 
we  conclude  that  the  performance  cannot  be  met.  given  our  prejudice.  Below,  an  iterative 
scheme  based  on  this  idea  is  proposed  [7].  This  scheme  is  well  defined  only  if  we  assume  that 
the  required  performance  of  any  controller  connected  with  the  real  process  can  be  tested  by 
using  a  finite  number  of  experiments.  Of  course  in  practice  these  are  the  only  performance 
requirements  that  we  can  ever  verify. 

1.  Pick  a  model  structure  parametrization. 

2.  Collect  a  set  of  data,  and  define  the  set  of  unfalsified  plants. 

3.  Find  a  “large”  subset  of  models  for  which  the  design  procedure  yields  a  controller  that 
delivers  the  required  performance  for  all  models  in  this  set.  If  no  such  set  exists,  go 
back  to  (1)  and  adjust  the  model  structure  and/or  the  performance  objective. 

4.  Test  the  controller  on  the  real  system.  If  the  controller  meets  the  performance,  then 
stop.  If  not,  then  the  above  subset  is  invalidated. 

5.  Use  the  data  acquired  from  testing  the  performance,  as  well  as  other  sets  of  data,  in 
order  to  invalidate  additional  plants. 

6.  Go  to  (3). 

This  scheme  defines  both  an  inner  and  outer  loop.  Within  the  inner  loop,  the  performance 
requirement  and  the  model  structure  parametrization  are  fixed,  and  the  acquisition  of  data, 
as  well  as  the  design  of  controllers  for  subsets  of  the  set  of  unfalsified  models,  continue  to 
reduce  this  set  until  a  controller  is  found,  or  a  decision  that  the  performance  cannot  be  met 
is  made.  We  then  iterate  the  outer  loop.  By  eliminating  large  subsets  in  step  (3),  the  inner 
loop  converges  to  a  decision  much  faster. 

The  process  of  elimination  requires  the  availability  of  methods  for  designing  robust  con¬ 
trollers  for  subsets  of  the  set  of  unfalsified  models.  It  is  assumed  that  for  a  given  subset,  a 
decision  can  be  made  as  to  whether  or  not  a  controller  that  meets  the  performance  exists.  If 
the  parametrization  of  models  and  the  performance  objective  are  such  that  no  exact  methods 
exist,  one  may  use  tests  based  on  the  existing  design  methods,  as  conservative  as  they  may 
be.  The  lesser  the  conservatism  of  the  methods  in  robust  control,  the  lesser  the  bias  of  the 
above  iterative  scheme  will  be. 
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The  step  of  testing  a  given  controller  on  the  real  system  generates  more  sets  of  data  that 
can  be  used  to  invalidate  more  models,  within  the  inner  loop  of  the  above  scheme  [32].  We 
may  have  the  ability  to  conduct  more  experiments,  in  which  case  they  have  to  be  devised  in 
such  a  way  that  they  have  sufficient  information  to  invalidate  more  unfalsified  models.  The 
design  of  such  experiments  is  one  of  the  research  directions  to  be  addressed. 

Note  that  the  system  identification  and  control  design  procedures  are  not  distinct  in 
this  framework,  but  intimately  connected.  The  process  of  refining  both  the  model  and  the 
control  design  takes  place  concurrently.  This  allows  us  to  exploit  the  full  power  of  new  set 
membership  identification  procedures,  and  robust  control  design  techniques,  to  work  with 
sets  of  plants  for  both  modeling  and  design.  The  potential  of  this  scheme  for  improving  over 
existing  techniques  arises  largely  from  exploiting  the  connections  between  these  two  fields. 

2.4  Learning  Theory 

Starting  with  the  seminal  work  of  Vapnik  and  Valiant,  there  has  been  a  surge  of  activity 
in  computational  learning  theory,  whose  objective  is  to  characterize  what  can  be  learned, 
and  how  much  information  is  required  for  effective  learning  to  take  place.  Although  this 
theory  had  not  been  linked  to  system  identification  and  control  theory,  the  basic  questions 
raised  are  similar  in  both  areas. 

An  important  factor  in  system  identification  is  that  the  experimenter  can  choose  what 
inputs  to  apply.  In  an  abstract  setting,  this  amounts  to  “active  learning”  whereby  the 
experimenter  has  latitude  as  to  the  type  of  information  to  be  obtained.  Our  results  [28] 
have  established  that  whatever  is  learnable  by  active  learning  is  also  also  learnable  under 
“passive”  learning,  but  active  learning  reduces  the  amount  of  experimentation  required.  In 
addition,  our  work  has  highlighted  the  fundamental  role  of  metric  entropy,  which  leads  to 
some  intriguing  possibilities  of  establishing  a  connection  with  the  control-theoretic  work  of 
Zames  [55]. 

Finally,  in  other  work  [29],  we  have  extended  the  traditional  model  of  computational 
learning  theory  (“PAC  learning”)  by  introducing  and  studying  the  notion  of  “generalized 
samples.”  Besides  the  applications  in  image  analysis  that  were  discussed  in  [29],  such  ex¬ 
tensions  of  the  traditional  model  may  prove  useful  in  bridging  the  gap  with  the  discipline  of 
system  identification. 
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3  Industrial  Interactions 


3.1  Professor  Dahleh’s  Industrial  Interactions 

Professor  Dahleh  has  been  involved  in  the  development  of  a  design  methodology  based  on 
the  l\  theory  (an  effort  supported  in  part  by  AFOSR),  which  has  been  fully  explained  in  the 
recent  book  [6].  To  make  this  methodology  accessible  to  industry,  he  has  been  involved  in 
developing  software  based  on  matlab  for  synthesis  of  controllers  for  plants  with  uncertainty. 
This  software  has  interactive  features  by  which  a  design  can  be  altered  by  graphically  chang¬ 
ing  the  various  responses  or  frequency  plots  of  the  system.  Version  of  this  software  are 
currently  available  through  ftp. 

Professor  Dahleh  has  been  working  very  closely  with  C.S.  Draper  Laboratory  in  the  areas 
of  robust  control  and  system  identification.  In  the  area  of  robust  control,  he  has  educated 
several  groups  about  the  various  robust  control  tools,  as  well  as  the  t\  software.  The  latter  is 
now  a  standard  tool  used  by  all  the  engineers  working  in  the  control  division.  Most  recently, 
Professor  Dahleh  and  his  students  have  been  involved  in  the  attitude  control  problem  of  the 
Earth-Observing  Satellite.  (The  i\  methodology  is  the  right  formulation  for  this  problem 
for  a  variety  of  reasons:  The  first  has  to  do  with  the  specifications  being  in  the  time  domain 
in  terms  of  limits  of  allowable  deviation  of  attitude  angles.  Secondly,  the  constraints  are 
in  terms  of  saturating  gyros  (due  to  accumulated  momentum).  And  finally,  the  class  of 
plant  uncertainty  includes  nonlinearities  as  well  as  time  variation.)  Professor  Dahleh  and  his 
students  have  done  designs  using  both  H ^  and  £l  and  shown  that  ix  can  exhibit  the  limits 
and  tradeoffs  of  the  design  in  a  much  more  systematic  fashion.  In  fact,  the  Ha 0  designs  had 
responses  that  are  quite  inferior  to  the  design  they  exhibited. 

In  the  system  identification  area,  Professor  Dahleh  has  been  supervising  the  implemen¬ 
tation  of  the  recently  developed  iterative  control  and  identification  scheme  (which  is  also 
developed  under  our  AFOSR  grant).  The  objective  is  to  develop  a  CAD  environment  by 
which  controllers  can  be  designed  directly  from  Data.  The  controllers  are  then  changed  as 
more  testing  Data  is  acquired.  C.S.  Draper  Laboratoy  plans  to  have  such  an  environment 
available  as  a  tool  for  designing  control  systems. 

Professor  Dahleh  has  also  been  working  very  closely  with  FIAT  research  center,  and 
recently  with  Ford  (a  starting  effort),  on  the  design  of  active  suspensions.  The  suitability  of 
the  t\  problem  is  also  clear  for  this  application. Professor  Dahleh  has  educated  engineers  to 
help  them  use  the  software  to  design  such  systems.  Also,  he  has  recently  done  a  complete 
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case  study  on  this  problem  exhibiting  the  exact  tradeoffs  between  the  specifications  and 
the  constraints.  Ford  is  also  interested  in  developing  capabilities  for  iterative  identification 
and  control  for  direct  use  in  their  test  environment  (they  have  a  complete  computational 
facility  inside  the  test  cars,  to  update  the  controller  design  from  the  testing  data).  The  setup 
Professor  Dahleh  has  proposed  as  a  result  of  his  research  in  the  system  identification  area 
appears  to  be  quite  attractive. 

Recently,  Professor  Dahleh  in  collaboration  with  other  faculty  at  MIT  has  acquired  a 
contract  from  Siemens  to  develop  capabilities  for  identification  of  nonlinear  systems  and  to 
develop  an  iterative  identifcation/control  environment.  Developing  software  is  a  major  part 
of  that  effort. 

Professor  Dahleh  has  also  been  in  close  contact  with  government  Labs.  In  particular,  he 
has  very  close  ties  with  Dr.  Ridglev  at  Wright  Patterson.  Dr.  Ridgely  has  been  involved  in 
studying  mixed  optimization  problems  and  recently  has  been  studying  the  lx  design  method¬ 
ology.  Professor  Dahleh  has  provided  him  with  draft  copies  of  the  book  [1]  as  well  as  access 
to  the  software.  Dr.  Ridgely  taught  a  course  from  [6],  and  many  of  his  students  are  now 
well  versed  with  the  current  robust  control  theory,  including  t\.  It  is  intended  to  push  this 
collaboration  further  and  educate  several  engineers  at  WPAFB  to  use  the  lx  software,  which 
will  be  available  very  soon.  This  will  be  accomplished  by  giving  short  courses,  and  demon¬ 
strations  of  the  software  on  site.  In  addition,  Professor  Dahleh  has  developed  very  close  ties 
with  Dr.  Coleman  in  one  of  the  Army  Labs  (ARDEC). 

Finally,  Several  Engineers  have  already  started  using  Professor  Dahleh’s  software  at 
Hughes.  Also,  several  engineers  are  investigating  using  the  software  for  noise  cancellation 
application  (vibration  suppression)  at  BBN.  Professor  Dahleh  has  also  made  initial  contacts 
with  several  companies  interested  in  control  and  identification  (e.g.  Speyer  which  is  inter¬ 
ested  in  semiconductor  devices,  elgin  Bailey  and  Bailey  controls  which  are  interested  in  the 
problem  of  integration  of  several  control  systems).  In  addition,  Professor  Dahleh’s  work  in 
system  identification  has  had  a  large  impact  on  the  space  lab  at  MIT.  The  objective  of  the 
experiments  is  to  study  the  modeling  problem  for  the  purpose  of  control.  Professor  Dahleh 
has  served  on  several  thesis  committees  and  was  quite  influential  in  guiding  the  research  in 
that  discipline. 
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3.2  Professor  Tsitsiklis  Industrial  Interactions 


Although  Prof.  Tsitsiklis'  industrial  interactions  are  not  directly  linked  to  the  core  subjects 
of  the  research  performed  under  this  grant,  there  have  been  extensive  such  interactions  that 
fall  within  the  broader  themes  of  systems  and  control  theory. 

For  example.  Prof.  Tsitsiklis  has  been  working  together  with  the  C.  S.  Draper  Laboratory, 
towards  the  development  of  hierarchical  control  architectures  for  the  planning  and  operation 
of  advanced  train  control  systems.  This  work  involves  the  application  of  the  decomposition 
methods  described  in  [3],  to  the  large  scale  planning  and  scheduling  problems  faced  by 
railroad  companies. 

One  of  the  directions  towards  which  Prof.  Tsitsiklis’  research  is  moving  is  the  application 
of  function  approximation  methods  in  the  computation  of  the  optimal  cost-to-go  function  of 
dynamic  programming  in  order  to  bypass  the  curse  of  dimensionality  that  plagues  nonlinear 
control  problems.  His  research  in  this  area  is  already  being  transferred  to  the  commercial 
sector,  by  a  number  of  companies  dealing  with  scheduling,  resource  allocation,  and  logistices 
problems.  This  line  of  research  should  be  of  interest  to  the  Air  Force  on  several  counts. 
First,  because  the  Air  Force  is  faced  with  several  challenging  logistics  problems;  second, 
because  with  the  accumulation  of  experience,  we  expect  to  be  able  to  solve  in  the  near 
future,  nontrivial  problems  involving  the  control  of  complex  dynamical  systems. 

In  another  effort,  Prof.  Tsitsiklis  and  two  more  M.I.T.  faculty  have  launched  a  research 
program  with  the  Groupe  Schneider  and  Square  D  whose  goal  is  to  “reengineer”  the  basic 
architectures  used  in  industrial  automation  and  to  envision  the  technology  that  will  take  the 
place  of  Programmable  Logic  Controllers  (PLCs).  This  research  taking  place  in  the  context 
of  frequent  site  visits  and  close  technical  interaction  with  Groupe  Schneider  engineers. 

Finally,  Prof.  Tsitsiklis  has  initiated  a  collaboration  with  faculty  in  the  M.I.T.  depart¬ 
ment  of  chemical  engineering  whose  goal  is  to  apply  neural  network  techniques  for  the  analysis 
of  pharmaceutical  process  data,  with  the  aim  of  identifying  “signatures”  that  can  be  used 
for  early  prediction  of  the  performance  of  a  batch  as  well  as  of  identifying  control  variables 
that  can  be  manipulated  so  as  to  enhance  performance. 

In  conclusion,  the  work  of  Professors  Dahleh  and  Tsitsiklis  has  been  coupled  directly  with 
several  industrial  activities.  These  activities  have  been  quite  extensive  and  have  shaped  the 
direction  of  our  present  research  directions.  In  addition,  both  are  working  closely  with  Dr. 
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Gunter  Stein  who  has  been  instrumental  in  shaping  the  research  effort  in  the  control  area  at 
MIT. 

4  Educational  Impact 

This  research  has  supported  three  excellent  Masters  theses  and  one  Ph.D  thesis  in  the  area  of 
system  identification.  The  first  S.M.  thesis  was  by  David  Tse,  in  which  the  problem  of  worst- 
case  identification  in  the  presence  of  bounded  noise  was  completely  covered.  The  second 
S.M.  thesis  was  by  Theodore  Theodosopoulos  in  which  the  problem  of  sample  complexity  of 
worst-case  identification  was  formulated  and  solved.  The  third  S.M.  thesis  was  by  Ian  Chen, 
which  generalized  Tse’s  results  for  bounded  noise  with  low  correlations.  The  Ph.D  thesis 
was  partially  supported  by  this  grant  in  which  the  problem  of  iterative  identification  and 
control  was  formulated  and  discussed. 

In  the  area  of  robust  control,  this  grant  supported  in  part  one  major  Ph.D  thesis  by 
Ignacio  Diaz-Bobillo,  which  contained  a  major  development  of  the  l\  theory.  The  software 
development  was  also  a  result  of  this  work.  Also,  this  research  supported  in  part  a  S.M. 
thesis  that  applied  the  software  to  the  earth-observing  system  (EOS-AM).  The  latter  thesis 
demonstrated  the  power  of  this  t\  theory  in  achieving  high  precision  in  pointing  applications. 

This  grant  also  supported  in  part  the  work  on  the  book  [6].  This  book  is  now  used  in 
several  universities  and  industrial  laboratories. 
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Abstract — This  paper  investigates  the  intrinsic  limitation  of 
worst-case  identification  of  LTI  systems  usine  data  corrupted  by 
bounded  disturbances,  when  the  unknown  plant  is  known  to 
belong  to  a  given  model  set.  This  is  done  by  analyzing  the 
optimal  worst-case  asymptotic  error  achievable  by  performing 
experiments  using  any  bounded  inputs  and  estimating  the  plant 
using  any  identification  algorithm.  First,  it  is  shown  that  under 
some  topological  conditions  on  the  model  set.  there  is  an  identi¬ 
fication  algorithm  which  is  asymptotically  optimal  for  any  input. 
Characterization  of  the  optimal  asymptotic  error  as  a  function 
of  the  inputs  is  also  obtained.  These  results  hold  for  any  error 
metric  and  disturbance  norm.  Second,  these  general  results  are 
applied  to  three  specific  identification  problems:  identification 
of  stable  systems  in  the  /,  norm,  identification  of  stable  rational 
systems  in  the  norm,  and  identification  of  unstable  rational 
systems  in  the  gap  metric.  For  each  of  these  problems,  the 
general  characterization  of  optimal  asymptotic  error  is  used  to 
find  near-optimal  inputs  to  minimize  the  error. 


I.  Introduction 

ECENTLY.  there  has  been  a  growing  line  of  work 
with  the  common  theme  that  system  identification 
should  be  performed  so  that  the  worst<ase  error  of  the 
resulting  model  is  small  in  a  metric  compatible  with 
robust  control  [8]— [10],  [26],  [37],  This  paper  addresses  the 
questions  of  asymptotically  optimal  identification  algo¬ 
rithms  and  experiment  designs  from  this  point  of  view. 
Our  emphasis  is  less  on  finding  efficient  algorithms  and 
more  on  finding  the  fundamental  limitations  in  identifica¬ 
tion  accuracy  achievaole  by  any  identification  algorithm  in 
the  limit  of  observing  more  and  more  data  corrupted  by 
nonstochastic  noise.  Thus,  this  work  is  in  the  flavor  of  the 
questions  posed  by  Zames  [41], 

We  will  deal  exclusively  with  discrete-time,  single-input 
single-output  linear  time-invariant  systems.  In  this  formu¬ 
lation.  the  unknown  plant  is  a  pnon  known  to  be  in  a 
certain  subset  ft  of  the  space  of  all  LTI  systems:  this 
subset  will  be  called  a  model  set  'Jft.  The  model  set  is 
endowed  with  a  general  metric  p  which  can  be  any 
uncertainty  measure  suitable  for  designing  robust  con¬ 
trollers.  To  identify  the  plant,  one  is  allowed  to  perform 
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one  or  more  finite  but  arbitrarily  long  experiments  using 
input  sequences  chosen  from  a  given  input  set  II.  (Typi- 
caiiv.  13  is  some  norm-bounded  set.)  The  measured  out¬ 
puts  are  corrupted  with  additive  disturbance  sequences 
which  are  bounded  in  an  lp  norm  II  •  !i  „  but  can  otherwise 
be  arbitrary.  The  problem  is  to  analyze  the  smallest 
worst-case  error,  over  all  plants  in  23?  and  all  admissible 
disturbances,  achievable  by  using  any  inputs  from  U  and 
anv  identification  algorithm  to  estimate  the  plant  from 
arbitrarily  long  but  finite  data  records  (i.e..  asymptotic 
error).  Our  goal  is  to  investigate  the  key  properties  of 
model  sets  which  can  be  identified  with  a  small  optimal 
error,  and  in  particular  how  large  the  model  set  can  be  to 
still  vieid  a  finite  optimal  error.  Furthermore,  we  are 
interested  in  robustness  issues:  does  the  optimal  error 
vanish  as  the  bound  on  the  output  disturbance  decreases 
to  zero?  Answers  to  these  questions  give  a  characteriza¬ 
tion  of  the  difficulty  of  identification  using  a  given  model 
set. 

A  natural  framework  to  study  worst-case  identification 
is  provided  by  information-based  complexity  theory  [21], 
[35],  [36].  This  theory  provides  a  general  mathematical 
framework  for  analyzing  the  optimal  error  achievable  in 
solving  a  problem  using  a  given  amount  of  possibly  inac¬ 
curate  and  partial  information.  Information  plays  the  cen¬ 
tral  role  in  this  theory:  the  results  depend  only  on  the 
information  used  by  an  algorithm  but  are  independent  of 
its  structure.  Our  work.  like  many  others  in  worst-case 
identification,  has  employed  some  of  the  basic  concepts  of 
this  theory,  but  the  key  results  we  derived  are  completely 
new. 

Although  mainstream  system  identification  research 
adopts  stochastic  models  for  the  noise,  there  is  a  line  of 
work  which  deals  with  worst-case  identification  under 
bounded  disturbances  [5],  [16].  [22]— [24],  [28],  [32],  [15]. 
More  recently,  specific  identification  algorithms  are  pro¬ 
posed  in  [8]— [10].  [26]  for  worst-case  identification  in  the 
WT  metric  from  noisy  frequency  response  data  and  in  [12], 
[25]  for  identification  in  the  /,  metric  from  time  series 
data.  In  contrast  to  these  works,  we  deal  with  general 
aspects  of  optimal  worst-case  asymptotic  identification  in 
a  general  error  metric.  Moreover,  the  issue  of  optimal 
experiment  design,  although  considered  in  stochastic  sys¬ 
tem  identification  (e.g..  [7f,  [20].  [43]),  has  not  been  satis¬ 
factorily  addressed  in  the  worst-case  setting.  Issues  of 
complexity  and  tradeoffs  between  the  length  of  experi¬ 
ments  and  accuracy  has  been  recently  reported  in  [3],  [13], 
[18],  [31]. 
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The  contributions  of  this  paper  are  two-folded.  At  a 
more  general  level,  it  introduces  a  framework  for  the 
analysis  of  optimal  worst-case  asymptotic  error  under 
bounded  disturbances.  The  central  resuit  here  is  that, 
under  some  toDoiogical  conditions  on  the  model  set. 
infinite -horizon  experiments,  where  the  enure  infinite  data 
record  is  available  to  compute  estimates,  can  be  viewed  as 
a  limit  of  finite-honzon  experiments,  where  only  finite 
data  records  are  available.  Analysis  of  optimal  asymptotic 
error  is  then  reduced  to  finding  optimal  inputs  to  mini¬ 
mize  the  worst-case  error  for  the  infinite-horizon  problem. 

At  a  more  specific  levei.  concrete  results  are  obtained  bv 
applying  the  general  framework  to  three  specific  identifi¬ 
cation  problems:  identification  of  stable  systems  in  the  /, 
and  H  metrics,  and  identification  of  unstable  systems  in 
the  gap  metric.  In  all  these  problems,  the  required  topo¬ 
logical  conditions  for  consistency  are  verified  and  the 
infinite-horizon  problem  is  analyzed  to  find  good  input 

designs.  . 

The  organization  of  the  paper  is  as  follows.  In  Section 
II.  the  identification  problem  is  tormuiated  and  the  opti¬ 
mal  worst-case  asymptotic  error  achievable  by  any  identi¬ 
fication  algorithm  is  defined.  In  Section  III.  we  present 
consistency  results  establishing  infinite-horizon  experi¬ 
ments  as  limits  of  finite-horizon  ones.  In  Section  IV.  the 
general  results  developed  are  applied  to  analyze  three 
specific  identification  problems.  Section  V  contains  our 
conclusion. 

II.  Problem  Formulation 

Let  X  be  the  class  of  all  causal,  single-input 
single -output,  linear  time-invariant,  discrete-time  sys¬ 
tems  We  identify  X  with  the  space  of  all  one-sided 
real-valued  sequences.  R*.  Let  SI  c  X  be  the  model  set 
which  is  assumed  to  contain  the  unknown  plant  h  to  be 
identified.  The  set  i1?  captures  the  experimenter's  a  priori 
knowledge  about  h.  Some  examples  of  Si  are  the  set  of 
all  stable  svstems.  the  set  of  stable  systems  with  a  bound 
on  the  decay  rate,  the  set  of  all  finite-dimensional  systems 
with  a  bound  on  the  order,  etc.  .Also  given  is  an  input  set 
LI  which  contains  all  the  input  sequences  that  can  be  used 
in  the  identification  experiments.  Typically.  U  is  a  norm- 
bounded  set.  to  reflect  physical  limitations,  power  restric¬ 
tions.  safety,  or  to  maintain  the  validity  of  the  linear 
model  of  the  plant. 

An  experiment  is  conducted  by  choosing  an  input  se¬ 
quence  u  e  11  and  measuring  the  output  sequence  y. 
related  to  u  by 

y  =  h*  u  t -  d  (2.1) 

where  *  denotes  the  convolution  operator  and  d  is  the 
disturbance  sequence  which  corrupts  the  measurements. 
(Note  that  h.u.v.d  are  all  one-sided  real-valued  se¬ 
quences:  h=(hn,hvhz.-.  etc.).  The  disturbance  d  is 
assumed  to  be  bounded  in  a  given  norm.  II d II  p  <  o  tor 
some  known  <5.  but  can  otherwise  be  arbitrary.  The  distur¬ 
bance  may  arise  from  actual  measurement  noise,  such  as 
quantization,  or  it  may  reflect  nonlinearities  and  time- 


variation  of  the  plant.  In  the  latter  case,  the  true  plant  is 
actually  nonlinear  and  time  varying  but  is  assumed  to  be 
approximated  weil  at  the  operating  range  by  an  LTI 
component,  which  is  the  object  of  identification. 

One  point  to  note  is  that  we  assume  that  the  system  is 
initially  at  rest  before  an  experiment  is  started.  Having  an 
unknown  nonzero  initial  condition  is  equivalent  to  having 
an  additional,  unknown,  additive  disturbance  u~  *  h. 
where  u~  is  the  (unknown)  input  before  time  r  =  0.  If  the 
model  set  fl't  is  bounded  in  the  operator  norm  from  the 
input  space  to  the  disturbance  space,  then  u  *  h  is 
bounded  if  u "  is.  and  this  additional  uncertainty  can  be 
accounted  for  bv  grouping  into  the  original  additive  dis¬ 
turbance  term,  if  "this  is  not  the  case,  however,  then  the 
problem  cannot  be  treated  in  the  present  framework. 

Now  suppose  N  such  independent  experiments  are 
performed!  The  question  whether  more  than  one  input  is 
needed  to  identify  plants  in  a  given  model  set  will  be 
addressed.  We  then  have: 

v''i  =  *  h  -  a  " ,  i  =  1.2,”*,  .V  (2-2) 

where  v"’  and  du)  are  the  output  and  disturbance  se¬ 
quences  in  the  ith  experiment.  This  can  be  written  in  a 
more  compact  notation: 

y  =  u*h+d  \d\\p  =  maxil</<,)!lp  <  6  (22) 

where  y  =  v‘l.  a  -  [u,1V”.u‘-v,l  and  d  - 

[d,u  •••, d(  V’]  are  vectors  of  sequences:  convolution  ot  n 
with’  a  vector  of  inputs  is  just  element-wise  convolution 
with  every  input.  Also  note  that  the  vector  of  inputs  u  is 

in  ll*  ...... 

An  identification  algorithm  is  a  mapping  <p  which  gener- 

ates.  at  each  time  instant  n.  an  estimate  h  — 
c b(Pnu ,  P^y)  e  X  of  the  unknown  plant  h.  given  the  input 
and  output  sequences  in  the  experiments.  Here.  Pn  is  the 
truncation  operator,  defined  by  P„x  =  (x0, *,,•••, .r„)  for 
each  infinite  sequence  x.  Its  use  signifies  that  the  algo¬ 
rithm  6  generates  at  each  time  instant  an  estimate  based 
only  on  the  input-output  data  it  has  seen  so  far.  Gener¬ 
ally.  we  will  assume  that  the  algorithm  has  access  to  what 
the  model  set  SI  is  and  also  the  value  of  5,  the  bound  on 
the  disturbance.  In  the  terminology  of  Helmicki  et  al.  (12], 
the  algorithm  is  tuned.  However,  in  some  cases,  we  will  be 
able  to  give  stronger  results  using  algorithms  which  are 

untuned  to  the  value  of  5.  ...  „ 

Also  given  is  an  extended  metric  p(v )  on  x,  p:  x  x  ± 
_  r  u(x),  which  evaluates  the  accuracy  of  hin)  as  an 

estimate  of  h.  ...  c 

Given  an  identification  algorithm  and  a  chosen  set  ot 
input  sequences  for  the  experiments,  we  would  like  to 
consider  the  limiting  situation  when  longer  and  longer  of 
the  output  sequences  are  observed.  To  this  end,  the 
worst-case  asymptotic  error  is  defined  as  follows. 

Definition  2.1:  Fix  the  inputs  a.  The  worst-case  asymp¬ 
totic  error.  e,(<f>,  3D?,  a.  6 ),  of  an  algorithm  d>  is  the  small¬ 
est  number  r  such  that  for  ail  plants  h  e  lift  and  for  all 
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disturoances  d  with  ii<fll„  <  <3. 

Iim sup  p\  6(Pnu.  Pr( u  »  '!  -  d)).h)  <  r. 

n  * 

Equivalently. 
ex(  6 .  2ft .  u .  6  ) 

=  sup  sup  Iim  sup  p{  oi  P.u.  Pr{u  *  h  t  d)).  h ). 

nS'Ut  :  i  ,si  n  — * 

According  to  this  definition,  no  matter  what  the  true 
plant  and  the  disturbances  are.  the  plant  can  be  eventu¬ 
ally  approximated  to  within  e^.6.  2ft.  u.  6 ).  using  the  esti¬ 
mates  generated  by  the  identification  algorithm.  This  is 
quite  analogous  to  the  notion  of  convergence  of  estimates 
to  the  true  plant  in  the  classical  probabilistic  framework 
of  identification.  However,  since  the  disturbances  here  are 
assumed  to  be  arbitrary  and  not  necessarily  stationary, 
such  convergence  is  not  possible  in  general.  Instead,  we 
only  require  the  estimates  to  enter  and  stay  within  a  ball 
around  the  true  plant  rather  than  to  converge  to  the  exact 
plant. 

In  the  above  definition  of  the  worst-case  asymptotic 
error,  although  convergence  of  the  estimates  to  within 
ej.4>,  2ft.  u.  5 )  is  guaranteed  for  ail  admissible  plants  and 
disturbance  sequences,  the  rate  of  convergence  may  be 
arbitrarily  slow  for  some  plants  and  some  disturbances. 
The  worst-case  asymptotic  error  is  said  to  be  uniform  if 
the  rate  of  convergence  is  uniform  over  all  admissible 
plants  and  disturbance  sequences.  If  the  convergence  is 
uniform,  the  worst-case  asymptotic  error  defined  above  is 
the  same  as  the  limit  of  the  worst-case  error  taken  at  each 
finite  time  n.  i.e., 

sup  sup  lim sup p(<MP„u.  P.iu  *  h  ■*-  d)).h) 

h  e  «•:</;’.  s  s  «-* 

=  limsup  sup  sup  p{6(  P.u.  Pfu  *  h  -r  d)).  h) 

„  _  I  i,  c  llrflL  £  0 

This  allows  one  to  a  prion  determine  the  experiment 
length  required  to  guarantee  that  anv  plant  in  the  model 
set  can  be  identified  to  a  prescribed  accuracy.  It  is  the 
notion  of  convergence  considered  by  Helmicki  et  al.  in 
their  framework  [11). 

Demanding  uniform  convergence  is  too  restrictive  a 
formulation  for  a  general  theory  of  fundamental  limita¬ 
tions  of  worst-case  identification.  .Although  such  uniform 
convergence  is  certainly  desirable,  it  is  impossible  to 
achieve  for  many  interesting  model  sets.  In  fact,  for  many 
inherently  infinite-dimensional  model  sets,  the  worst-case 
error  at  each  finite  time  is  always  infinite,  while  the 
worst-case  asymptotic  error  can  be  made  small  using  an 
appropriate  identification  algorithm  and  inputs.  Our  for¬ 
mulation  thus  allows  us  to  discuss  optimal  worst-case 
identification  and  optimal  inputs  for  a  much  broader  class 
of  model  sets.  Besides,  in  some  applications  of  identifica¬ 
tion.  such  as  adaptive  control,  uniform  convergence  of 
estimates  is  not  necessary  to  fulfill  the  desired  objectives. 
However,  because  of  the  special  importance  of  uniform 
convergence,  we  will  give  additional  conditions  on  the 


model  set  for  this  to  take  piace.  It  wiil  be  seen  that  these 
conditions  are  quite  strong  and  essentially  require  the 
model  set  to  be  finite-dimensional.  It  is  worthwhile  to 
note  that  the  model  set  considered  in  [8],  [9]  satisfies  these 
conditions. 

The  optimal  worst-case  asymptotic  error  £ju,  2ft.  8)  is 
defined  as  the  smallest  error  achievable  by  any  algorithm: 

EJ.u.  ift.  8)  =  inf  e_(0,  2ft.  u,  8). 

d> 

.Any  algorithm  for  which  the  infimum  is  attained  is  said  to 
be  asymptotically  optimal.  We  will  obtain  a  general  charac¬ 
terization  of  the  asymptotically  optimal  algorithms  and 
the  resulting  optimal  worst-case  asymptotic  error,  for  given 
inputs  u.  For  specific  problems,  we  will  find  conditions  on 
the  inputs  u  to  make  this  optimal  worst-case  asymptotic 
error  small. 

It  should  be  noted  that  the  asymptotically  optimal  algo¬ 
rithms  to  be  derived  are  valid  for  arbitrary  inputs  u.  This 
allows  the  complete  separation  of  the  problem  of  devising 
optimal  algorithms  and  the  problem  of  designing  optimal 
inputs.  This  is  particularly  important  when  there  is  no 
complete  control  over  the  choice  of  the  inputs  into  the 
plants,  such  as  in  closed-loop  experiments  or  in  adaptive 
control.  In  these  problem,  this  “separation  principle”  fa¬ 
cilitates  the  derivation  of  necessary  conditions  on  the 
input  signals  for  accurate  identification  to  take  place. 

We  would  also  like  to  point  out  that  there  are  some 
recent  asymptotic  optimality  results  in  the  general 
information-based  complexity  framework  [14],  However, 
their  notion  of  optimality  is  that  of  the  rate  of  convergence 
of  the  worst-case  error  for  any  fixed  problem  element, 
and  their  results  only  make  sense  if  the  error  converges  to 
zero.  In  contrast,  in  the  worst-case  identification  problem 
we  are  dealing  with,  the  error  does  not  typically  converse 
to  zero,  and  our  notion  of  optimality  is  that  of  the 
nonzero  limit  supremum  of  the  error. 

III.  Asymptotically  Optimal  Identification 

In  this  section,  the  inputs  will  be  assumed  to  be  fixed. 
The  characterization  of  asymptotically  optimal  algorithms 
and  optimal  worst-case  asymptotic  error  is  in  terms  of  the 
important  notion  of  the  uncenainvy  set,  an  important 
notion  in  information-based  complexity  theory. 

Definition  3.1:  Let  u  and  y  be  the  input  and  measured 
output  sequences,  and  8  be  the  bound  on  the  distur¬ 
bances.  The  finite-horizon  uncertainty  set  at  time  n  is 
defined  to  be 

S„VSR.u.y,  6)  =  [g  e  2ft:  II Pn{u  *  g  -y)llp  £  5} 

and  the  infinite-horizon  uncertainty  set  is 

S^m.u.y,  8)  =  (g  e  2ft:  ||u.  g  -  y\\p  z  8). 

The  set  Sn  contains  all  the  plants  in  the  model  set 
consistent  with  the  output  data,  seen  until  time  n.  It 
characterizes  the  uncertainty  at  time  n:  any  plant  in  S„ 
can  be  the  actual  plant  from  the  experimenter’s  point  of 
view.  Similarly,  5X  contains  all  the  plants  that  are  consis- 
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;eni  With  the  entire  output  sequences.  It  measures  the 
uncertainty  that  the  experimenter  wouid  stiil  have  even  if 
he  could  perform  infinitely  long  experiments  and  couid 
see  the  entire  output  record.  It  is  easy  to  see  that  the 
unite -horizon  uncertainty  sets  become  smailer  with  in- 
c-easme  ti. 

For  any  set  A  c  JE.  define  the  diameter  and  radius  of 
the  set  A  as 

diam ( A )  =  sup  pig.h). 

g.he.-l 

rad  ( A )  =  inf  sup  p(g,h). 

SH  h*A 

Note  that  diam(A)/2  <  radt.-I)  <  diam(A).  We  shall 
now  define  two  important  quantities. 

Definition  3.2:  Given  a  choice  of  the  inputs  u.  define 
the  infinite-horizon  diameter  of  information  D(u,33l,8) 
and  radius  of  information  Riu.N.8)  to  be  respectively 
rne  diameter  and  radius  of  the  largest  possible  uncertainty 
set: 

D(u.±R.8)  =  sup  sup  diami5,(2rl.  u.u*n+  d.  8)) 
h  1W11.S5 

R(u.  331.8)  =  sup  sup  rad  {S,(33l.u.u*  h  +  d,8)). 
He-Ul  IldlLsi 

In  information-based  complexity  terminology,  these 
quantities  correspond  to  the  diameter  and  radius  of  infor¬ 
mation  for  the  infinite-horizon  problem  where  the  infor¬ 
mation  available  is  the  entire  infinite  output  sequence. 
The  quantity  D(u,33l.  8)  is  the  largest  distance  between 
two  plants  for  which  there  are  admissible  disturbances 
such  that  the  plants  give  exactly  the  same  outputs.  It  turns 
out  that  it  is  precisely  this  quantity  that  characterizes  the 
optimal  worst-case  asymptotic  enors.  First  we  show  that 
haif  the  infinite-horizon  diameter  of  information  is  a 
lower  bound  to  the  optimal  asymptotic  error. 

Proposition  3.3:  Let  '331  be  any  model  set.  u  be  any 
vector  of  inputs  and  8  >  0.  Then 

ej.<j),33l.u.  8)  >  D(u.33l.  S)/2 

tor  any  algorithm  d>. 

Proof:  Let  be  an  algorithm  for  the  infinite-horizon 
problem,  i.e..  given  the  entire  input  and  output  sequences, 
i i,  eenerates  an  estimate  for  the  plant.  The  worst-case 
error  achieved  bv  this  algorithm  is: 

sup  sup  p(i/i(u. u  *  h  +  d), h) 

h€lR  IWILsa 

and  the  infinite-horizon  optimal  worst-case  error  achiev¬ 
able  by  any  algorithm  is 

inf  sup  sup  p(<l/(u,u  *  h  +  d),h).  (3.4) 

*  fteSJl  IWIUso 

One  should  note  that  while  the  algorithms  allowed  in 
this  infinite-horizon  problem  have  access  to  the  entire 
infinite  input-output  sequences,  the  algorithms  for  the 
asymptotic  problem  have  access  to  only  finite  but  arbitrar¬ 
ily  long  portions.  Consequently,  the  infinite-horizon  opti¬ 
mal  worst-case  error  lower  bounds  the  optimal  asymptotic 


error  £,(«.  Si.  5).  On  the  other  hand,  by  a  centrai  result 
in  information-based  complexity  theory  [35],  this 
infinite-horizon  optimal  error  is  given  by  the  infinite- 
horizon  radius  of  information  Riu.  Si.  5 ).  which  in  turn  is 
lower  bounded  by  haif  the  diameter  of  information 
D(u.  Si.  8 ).  Hence,  the  result  follows. 

The  key  question  now  is  whether  there  exists  an  opti¬ 
mal  algorithm  which  can  always  generate  estimates  with 
error  converging  to  this  lower  bound.  By  the  definition  of 
the  infinite-horizon  uncertainty  set.  there  exist  two  plants 
at  a  separation  of  D(u,  Si,  8)  which  can  give  rise  to 
exactly  the  same  output  measurements.  Thus  in  the  worst 
case,  there  is  no  way  for  any  finite-duration  experiments 
to  distinguish  between  them,  and  this  gives  rise  to  the 
lower  bound  proved  above.  Conversely,  any  two  plants 
with  a  separation  greater  than  D(u.33l.  5)  can  be  distin¬ 
guished  if  we  perform  experiments  of  sufficiently  long 
length.  That  is.  if  h  is  the  true  plant,  and  h'  is  another 
plant  which  is  far  away  from  h  (separation  greater  that 
D(u.Si.  5)).  there  exists  a  time  TUi')  for  which  one 
needs  to  observe  the  output  to  eliminate  h'  from  consid¬ 
eration  as  a  possible  candidate.  However,  to  guarantee 
that  an  accurate  estimate  at  time  n  can  be  obtained,  one 
needs  T(/F)  ^  n  for  all  plants  h  that  are  far  away  from 
h.  Otherwise,  although  the  identification  algorithm  always 
picks  estimates  which  are  consistent  with  the  output  seen 
so  far.  the  estimates  may  nevertheless  diverge  from  the 
true  plant. 

The  issue  discussed  above  is  really  one  of  consistency 
between  finite-horizon  experiments,  where  only  a  finite 
data  record  is  available  for  computing  estimates,  and 
infinite-horizon  experiments,  where  the  entire  infinite  data 
record  is  available.  The  question  is  when  the  latter  can  be 
viewed  as  a  limit  of  the  former.  In  [17],  such  a  consistency 
result  is  established  by  placing  a  stationarity  assumption 
on  the  noise  and  then  appealing  to  the  law  of  large 
numbers.  As  far  as  we  know,  this  issue  has  not  been 
considered  in  an  unknown-but-bounded  noise  setting.  In 
fact,  it  will  now  be  shown  that  a  compactness  condition  on 
the  model  set  will  guarantee  consistency. 

The  following  theorem  shows  that,  under  a  o’-compact- 
ness  assumption  on  2JZ.  D(u.  331.  8 )  is  an  upper  bound  for 
the  optimal  asymptotic  error.  Combining  with  Proposition 
3.3,  we  have  upper  and  lower  bounds  that  agree,  within  a 
factor  of  2.  Thus,  the  study  of  the  optimal  asymptotic 
error  is  reduced  to  the  study  of  Diu,  331,  5),  if  we  ignore 
this  factor  of  2. 

Theorem  3.4:  Suppose  that  the  model  set  331  is  o- 
compact  in  the  p-topology,  331  =  U ,  3Jljf  A/;  c.  M,  *,  (  V,, 
331.  compact  and  on  each  331  n  convergence  in  the  p- topol¬ 
ogy  implies  component-wise  convergence  of  the  impulse 
response.  Then  there  is  an  identification  algorithm  <t> 
such  that  ex(<f>*,  331,  u,5)<  D(u,  331.  8)  for  all  u  and  8  > 
0. 

It  should  be  noted  that  by  an  elementary  result  in 
information-based  complexity  theory,  the  optimal  worst- 
case  error  achievable  when  the  algorithm  has  full  access 
to  the  entire  infinite  input-output  sequences  is  also 
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bounaed  between  the  mrinite-norizon  diameter  or  infor¬ 
mation  and  hail  the  diameter  of  information.  Our  two 
results  (Proposition  3.3  and  Theorem  3.4)  are  of  an  en¬ 
tirely  different  nature:  they  assert  that  the  optimal  worst- 
case  asymptotic  error  achievable  when  the  algorithm  has 
access  to  finite  but  arbitrarily  long  data  records  also 
satisfies  the  same  bounds.  The  assumed  topoiozical  condi¬ 
tions  are  crucial  for  the  validity  ot  Theorem  3.4. 

Before  proving  Theorem  3.4.  we  need  one  more  defini¬ 
tion  and  a  few  iemmas. 

Definition  3.5:  For  given  inputs  u  and  bound  5  on 
disturbances,  and  g.h  e  T.  define  Tu  6ig.h)  to  be  the 
smallest  integer  k  such  that  ;;/\(u*(g  -  )z))||p  >  25.  If 
no  such  k  exists,  then  Tu  6{g,h)  is  infinite. 

Lemma  3.6:  For  any  two  plants  g.h  e  it1?.  Tu  6ig,h )  is 
the  smallest  k  such  that  there  is  no  output  v  with  g  and 
h  in  the  same  uncertainty  set  S^'N.u.  y,  6). 

Proof:  If  n  =  Tu  i(g,  h).  then  1  PJu  *ig  -  )z)}||p  > 
25.  so  for  even-  output  sequence  y.  either  :  Pn{u  *  g  - 
_v}ilr  >  5  or  ii P.[u  »  h  -  y}||p  >  5.  by  the  triangle  inequal¬ 
ity.  Hence,  g  and  h  cannot  be  in  the  same  uncertainty  set 
5r(i'f.  u.  y,  5 )  for  any  y.  Conversely,  if  n  <  Tu  6{g.it). 
then  P.{u  *(  g  -  /z)}||p  <  25.  so  picking  v  =  u  « ( g  - 
h)/2  yields  liP.{u*g  -y)lip  <  5  and  WPju  «  h  -v}||p  < 
5.  Hence,  g,  h  e  Sn0D?.  u.  y,  6 ).  I 

Thus,  given  two  plants  g  and  h.  Tuiig,h)  is  the  mini¬ 
mum  duration  for  which  one  has  to  observe  the  output  to 
ensure  that  at  least  one  of  the  two  plants  can  be  elimi¬ 
nated  from  consideration  as  the  true  plant. 

Lemma  3.7:  Let  g./ze'TL  If  p(g,h)  >  Diu.  iT  5). 
then  Tu  0(g,h)  <  x. 

Proof:  Suppose  Tu6ig.h)  = x.  Then  ,,P^u*(g- 
/t))llp  <  25  for  every  k.  so  llu  «(g  -  /z)l|p  <  25.  Now  con¬ 
sider  the  disturbance  d  =  u*t/z  -  g)/2.  and  the 
infinite-horizon  uncenaintv  set  Sri'53i.  u.  u  *  g  -*-  d.  6 ). 
arising  when  g  is  the  true  plant.  tNote  that  v.d\\p  <  5.) 
But  u  •  h  —  1  u  *  g  +  d)\\p  —  u  •{h  —  g  )/2!!p  <  5.  so 
the  plant  h  is  also  in  the  set  5 J u.u  *  g  *  d.  5 ).  Hence, 
by  definition  of  the  infinite-horizon  diameter  of  informa¬ 
tion.  pig.  h)  <  Diu.  i!)?.  5).  z 

The  desired  topological  condition  involves  the  topoloev 
of  component-wise  convergence  of  sequences,  or  the  so- 
called  product  topology  [27], 

Lemma  3.8:  Fix  the  inputs  u  e  Btl  and  5  >  0.  Let 
A  c  iP  x  '.U?  be  compact  in  the  product  topoiosv.  and 
suppose  Tui(g.h)  is  finite  for  every  (g,/z)e,-L  Then 
sup (g,nieA  Tu  i(g,h)  is  also  finite. 

Proof:  Suppose  sup(y  Aie  A  Tu  6{g.h)  =  x.  Then  there 
exists  a  sequence  of  plants  (g1".  h'n)  in  A  such  that 
lim,  _x  Tu  s (gt“.h,‘>)  =  x;  furthermore,  the  sequence  can 
be  assumed  to  converge  (in  the  product  topology)  to  a 
pair  of  plant  (g*./z*)  e  A  since  A  is  compact.  Let  n *  h 
Th.iti'-h*)  <  x-  By  definition.  uPn.(u*(g*  - /z*))||p  > 
25.  Since  the  norm  of  a  sequence  is  a  continuous  function 
of  finitely  many  of  its  components,  it  follows  that 
||/>n.(i/*<g  -/z))llp  is  a  continuous  function  of  ( g,h )  in 
the  product  topology.  Hence,  there  exists  a  ball  B  (in  the 
product  topology)  around  ( g*,/z *)  such  that  for  every 


<  g.h')  £  B.  P..Au  * i  g  -  *V))I|„  >25.  i.e..  Tu  6ig\h') 

<  nf  for  every  ig  ,/z')  e  B.  But  this  contradicts  the  fact 

that  lim.  _  *  Tu  Ag':>.  h'n)  =  x  since  t  g-’.  hin)  — •  tg*.  /z’). 
Hence,  it  can  be  concluded  that  sup,c  ,1=  ^  f,  A(g,i z)  is  in 
fact  finite.  □ 

Basically,  this  iemma  says  that  if  each  plant  in  the 
compact  set  .4  can  be  eventually  ruled  out  as  the  true 
plant,  there  is  a  finite  time  after  which  all  of  them  can  be 
simultaneously  ruled  out. 

Now  we  are  in  a  position  to  prove  Theorem  3.4. 

Proof:  Define  the  identification  algorithm  6’  as  fol¬ 
lows:  at  each  time  n.  the  algorithm  generates  as  an 
estimate  by  picking  any  arbitrary  plant  h{n)  in  the  set 
S„  n  l1?  (. ,  where  5,  is  the  uncertainty  set  after  observing 
the  output  data  until  time  n.  and  k  is  the  least  integer  i 
such  that  S.  ~  'iR.  is  nonempty.  We  claim  that  this  algo¬ 
rithm  will  have  an  asymptotic  error  of  at  most  Diu.  M.  5) 
for  all  inputs  u  and  5  >  0. 

Fix  the  unknown  plant  h  =  '21?  and  let  e  >  0.  Also  let 
'}3lk  be  the  smallest  of  the  compact  subsets  fR’s  which 
contains  h.  Define  the  set 

Aih.e)  m  (g  e  $lh:  pig. ft)  >Diu.5R.S)  +  e}  (3.5) 
and  the  number 

Tih, e)  =  sup  Tui{g.h).  (3.6) 

gBAth.  t » 

Since  Aih.  e)  is  a  closed  subset  of  2 Rh  (with  respect  to 
the  p-topology).  it  is  also  compact  in  the  p-topology.  Since 
the  p-topology  is  finer  than  the  product  topology  in 
Aih.  e )  is  also  compact  in  the  product  topology.  By  Lemma 
3.7.  T„  ^(g,/ z)  is  finite  for  all  ig,h)  e  Aih.  e).  Hence,  by 
Lemma  3.8.  Tih.  e)  is  also  finite. 

Now  consider  the  estimates  h'n)  generated  by  the  algo¬ 
rithm  <b" .  Since  ii'"'  is  picked  from  the  least  k  such  that 
5„  n  '2’((  is  nonempty.  h'n<  is  guaranteed  to  be  in  Mh  for 
all  n.  (This  is  because  Sr  is  nonempty:  it  contains 

the  true  plant  h.)  Also  fi1'”  is  in  the  uncenaintv  set  Sn 
and  by  Lemma  3.6.  Tu  6ih{n’.h)  >  n.  If  we  now  take  any 
n  >  Tih.  e).  we  have  Tuiih{n).  It)  >  Tih.  e)  so  h{n)  is  not 
in  Aih.  e).  But  hin>  is  in  il7c ,, ,  so  it  follows  that  p(hln).h) 

<  Diu.HJl.8 )  -  e. 

Since  e  is  arbitrary,  it  can  now  be  concluded  that 
limsupp(/z<'”. It)  <  Diu.'JJl.  5 ) 

completing  the  proof.  □ 

The  above  construction  of  the  asymptotically  near- 
optimal  algorithm  <t>*  can  be  viewed  as  an  application  of 
Occam's  Razor — that  one  should  always  use  the  “simplest” 
theory  to  explain  the  given  data.  Here,  as  is  true  in 
general,  there  is  no  absolute  measure  of  simplicity.  Rather 
it  is  defined  by  the  choice  of  the  nested  partitioning  of  the 
model  set.  £)?  =  U ,  2^,.  Given  this  nested  structure,  plants 
in  the  smaller  :'J?,’s  are  considered  to  be  simpler  than 
those  in  larger  :!)?..  Convergence  of  the  estimates  is  guar¬ 
anteed  by  always  choosing  the  simplest  plant  that  is  con¬ 
sistent  with  the  data  seen  so  far.  This  avoids  overfitting  of 
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data,  a  oroblem  which  crops  up  ail  the  time  in  statistics  > 
ana  pattern  recognition.  It  is  interesting  to  note  mat  this  <  < 
same  principle  of  Occam's  Razor  nas  also  peen  applied  to  m 
guarantee  convergence  in  distribution-tree  proDaomstic  Lei 
[earning  problems  [1],  [30]. 

In  contrast  to  the  o-compactness  condition  that  guara  - 
tees  convergence,  a  stronger  compactness  condition  guar-  ^ 

antees  uniform  convergence. 

Proposition  3.9:  Suppose  convergence  in  the  p- topology 
on  implies  component-w.se  convergence  ot  the  im¬ 
pulse  response.  If  the  model  set  SR  is  compact  in  the 
p-topoioev.  then  there  is  an  algorithm  rf.  the  estimates  ot 
which  will  converse  uniformly  to  within  D(  u.  J  1.8  of  th  ^ 
true  plant:  i.e..  for  all  e  >  0.  there  exists  a  time  T(  e)  such  ^ 

that  for  all  h  £  iR.  \\d\\P  ^  3. 

p(HP.u.Pn<.u*h  +  d)).h)  <Dlu.Hl.5)  +  «  su 

'in  >  Tie).  e> 

in 

Moreover,  the  aisomhm  does  not  require  the  knowledge  ci 
of  5.  the  bound  on  the  disturoances.  to  compute  its  K 

~SUmproof:  .-Mi  algorithm  <t>  is  defined  as  follows:  for  each  S| 

n.  ^ 

<b(P  U  P  y)  =  argmin  :l  Pn(“  *  S  ~  ^.7)  ? 

QKrnu.rnj  l( 

The  minimum  must  exist  sit.ce  ®  is  compact  and  . 

iP(B*e-v)||.  is  a  continuous  function  of  g  in  the  f 
product  topology  and  hence  in  the  p-topology.  Also  note  ^ 
that  computing'this  estimate  does  not  require  the  know!-  . 

dsc  of  5  ^ 

Now  V  =  U  *  h  +  d  for  some  true  plant  h  and  distur-  _ 

bance  d  satisfying  ,IH!„  <  <5.  Bv  definition,  the  estimate  at  j 

each  time  n  satisfies 

;| Pn(u*oiPnu.Pny)  -y.il?<il P+u-h  -y)i \? 

=  \\P„d\\p  <  6 

and  hence  d ,{Pnu.  Pny )  e  S,(2R.  u.y.8)  for  each  n.  where 
5  is  the  finite-horizon  uncertainty  set  at  time  n.  We  shall 
use  only  this  property  of  the  estimates  of  <f>  to  show  that 

they  uniformly  converge. 

Let  e  >  0.  For  each  plant  h  £  -i.  define 

A(h.e)  =  (g  6  HI:  pig .h)  >  D(u.Hl.S)  +  «).  (3.8) 

.Also,  consider  the  number 

T{e)  =  sup  sup  Tuiig,h)  (3.9) 

h  s  i'i  e  €  Aih .  ( i 

where  the  function  T..t  has  been  defined  earlier.  He) 
can  be  rewritten  as  supu  „1=  fl((,  TuS(h  ).  where 

BU)  =  {(g,A)  6  HI2:  p(g.h)  >  D(u.Hl.S)  +  e}. 

It  is  clear  that  Bie)  is  a  closed  set  and  hence  compact 
in  the  p-topology,  being  a  subset  of  HI-.  Hence.  He  is 
also  compact  in  the  product  topology.  Now  F  d(g,/i) 
finite  for  all  ig.h)  in  He),  by  Lemma  a.7.  Hence,  by 
Lemma  3.8.  He)  is  finite. 


Now  if  n  >  He),  then  for  any  plant  h  £  HI  and  ildilp 
<  8  the  estimate  h{n)  senerated  by  the  algorithm  must  he 
”  the  uncenaintv  set  ~  S.fHl.u.u*  h  ^  d.8).  Hence,  by 
Lemma  3.6.  T,6Un>.h)  >  n  >  He).  This  implies 

p(h'-n\h)  <  Diu.Hl.  5)  -  e. 

Since  this  holds  for  all  h  and  d.  the  convergence  is  indeed 
uniform. 

IV.  Application  of  General  F ramework  to 
Specific  Problems 

The  above  results  state  that  under  some  compactness 
conditions  on  the  model  set.  the  optimal  worst-case 
asymptotic  error  achievable  by  any  identification  algo¬ 
rithm  is  characterized  by  the  function  D(u,  2R.  5),  mea¬ 
suring  the  worst-case  uncertainty  from  infinite-horizon 
experiments.  It  describes  the  intrinsic  difficulty  of  identify¬ 
ing  plants  in  a  given  model  set.  independent  of  the  spe¬ 
cific  identification  algorithm  used.  This  result  enables  us 
to  move  from  the  analysis  of  the  error  of  specific  algo¬ 
rithms  to  the  analysis  of  the  function  D(u,  HI.  5  ).  In 
specific  problems,  we  would  like  to  find  inputs  u  such  that 
D(u  H  5)  is  small  or.  at  the  very  least,  vary  continuously 
with  the  noise  bound  5  at  5  =  0.  This  would  imply  that 
identification  accuracy  is  robust  to  measurement  noise. 

The  value  of  the  diameter  of  information  £>(«,  Dt,  5 )  is 
in  general  difficult  to  evaluate  because  it  is  the  supremum 
over  the  diameter  of  all  possible  mfinite-honzon  uncer¬ 
tainty  sets.  However,  if  the  p  metric  comes  from  a  norm, 
it  turns  out  that  for  an  important  class  of  model  sets.  DU 
'HI,  5 )  has  a  simple  characterization.  These  are  the  model 
sets  which  are  convex  and  balanced.  (A  set  A  is  said  tobe 
C  balanced  if  for  every  h  in  A.  -h  is  also  m  A.)  Hie 
following  proposition  gives  the  characterization,  and  it 
follows  from  a  basic  resuit  in  information-based  compiex- 

’  Proposition  4.1:  Suppose  pig.h)  =  ilg  -  J°r  some 
norm  II  •  llx.  If  is  a  balanced  convex  subset  of  x,  then 
5  the  worst-case  diameter  is  attained  when  the  true  plant 
lt  and  the  disturbance  are  both  0.  That  is. 

D(u,Hl,5)  =  sup  sup  diam (Sj.Hl.u,u*h  +  d,5)) 
heSl  IUIUsS 

^  =  diam(5,('H,  u.0.  5)). 

Now  we  will  apply  the  general  results  proved  above  to 
9)  analyze  specific  identification  problems.  We  take  our  in¬ 
put ‘set  II  to  be  H,h{«:  Hut  <  IK  where  HulU  = 
e)  sup,  Iu,|.  (The  1  is  taken  for  normalization  purpose.)  The 
disturbance  is  assumed  to  be  an  /,  signal  d.  with  II  IU  - 

A.  Identification  of  Stable  Plants  in  the  l,  Norm 

act  Here  the  metric  considered  is  pig.h)  s\\g  -h\\x,  and 
,  is  we  restrict  ourselves  to  stable  plants  with  impulse  re- 
,  is  sponses  of  finite  /,  norm.  We  shall  first  prove  a  general 
bv  lower  bound  for  DU  HI.  5 )  which  holds  for  ail  inputs  « 
and  for  a  wide  class  of  model  sets. 
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ProDosmon  4.2:  Assume  the  moael  set  27J  contains  two 
plants  at  an  /,  distance  of  28  apart.  Then  for  any  number 
of  experiments  .V  and  any  set  of  inputs  u  t  SL 

D(u.2?.S)  >  28. 

Proof:  Let  g.h  e  i1?  satisfy  g  -  Ail,  =  2  5.  Suppose 
that  u  are  the  inputs  used  in  the  identification  experi¬ 
ments  and  h  is  the  actual  plant.  Let  the  disturbance  be 
d  =  u  *(g  -  h)/ 2.  Note  that  li d.;*  <  liulUKg  -  h)/2W\  = 
8. 

The  observed  output  is  y  =  u*h+d  =  u*(g  +  h)/2. 
Now.  u*  g  -  yijoD  =  !l(l/2)u»tg  -  A)||*  <  (l/2)||uiL 
!ig  -  Ail,  <  5.  Therefore,  g  e  SS'HJl.u.y,  5).  Since  h  is 
also  in  5,(2?.  u.  y,  5),  it  follows  that 

diam  (5,(2)?.  u,  y,  8))  >  ..g  -  /til,  =  25. 

Since  Diu.OJl.  5)  is  the  diameter  of  the  largest  possible 
uncertainty  set.  the  desired  lower  bound  follows.  □ 

We  now  demonstrate  that  in  fact,  for  at!  balanced  and 
convex  model  sets  of  stable  plants,  this  lower  bound  can 
be  reached  using  just  one  input,  provided  that  it  satisfies  a 
persistent  excitation  property. 

Definition  4.3:  Let  '21  be  the  set  of  all  finite  sequences 
of  l’s  and  -  l's: 

'21  ■  {(a„a:.-.at):k  >  1.  a:  e  {1,-1},  Vi).  (4.10) 

The  sequence  v  e  Bl x  is  said  to  contain  all  finite  se¬ 
quences  of  l’s  and  -l's  if  for  evert'  finite  sequence 
a  e  '21.  there  exist  m.n  such  that  ivm,  vm,  vm,.„)  =  a. 

Theorem  4.4:  Assume  '22?  is  balanced  and  convex  and 
contains  only  stable  plants.  If  um  contains  all  finite  se¬ 
quences  of  l’s  and  -  l’s.  then 

D(u*.  '22?.  5 )  <  25. 

Proof:  By  Proposition  4.1.  the  diameter  of  informa¬ 
tion  is  given  by  the  diameter  of  the  uncertainty  set  cen¬ 
tered  at  0: 

D(u*.2)?.5)  =  diam(5,(2r?.u*.0,  5)). 

Consider  any  g  e  5,(2)?.  u*.  0.  5 )  and  let  e  >  0.  Since  g  is 
stable,  there  exists  2J?  such  that 
* 

E  !*,i<e.  (4.11) 

k-M*  1 

Now  consider  the  finite  sequence 

( sgn  ( g* ), sgn  (gM sgn( g0 ))  e  '21 

where  sgn  is  the  signum  function  such  that  sgn(x)  =  1  if 
x  St  0  and  sgn(x)  =  - 1  if  x  <  0. 

By  definition  of  the  sequence  u\  there  exists  m  such 
that 

<  =  $gn(gM),<*,  =  sgn(gM_,),-. 

=  sgn  (g0)- 


We  tnen  have 

m  +•  M 

i( u*  *  g  )m-  \t  i  =  j  E  u2~\i-k 

i  k  -  0  ! 

M  m  +  M 

=  E  “«-«-*&  +  E  <+M-kgk 

k-  o  ;-.w  1 

M  rn  M 

=  E  $gn  (g*)g*  -  E  <  +  M-kgk 

/c »  o  w+i 

,Vf  m  +  M 

^  E  !g*-i  -  E  g*i 

k-0  k-M+  1 

>  llglli  -  e.  (4.12) 

But  g  —  SJ.'iDl.u'.O.  5).  so  i(w*  *  g)m.M ,  <  6.  Hence,  it 
follows  from  inequality  (4.12)  that  i  g si,  <  5  +  e.  Since 
this  is  true  for  every  e  >  0.  it  follows  that  llglli  <  6  for 
any  g  e  5,(2)?.  u".  0.  5 ).  Thus. 

ZXh.21?.  5)  =  diam (5,(2?. u‘.Q.  5)) 

=  sup  2||g||,  <  25.  □ 

geSjCS J!.u*.0.5> 

.An  input  satisfying  the  above  condition  has  been  pro¬ 
posed  independently  by  Makila  [25]  for  /,  identification.  It 
is  also  of  interest  to  note  that  the  random  binary  se¬ 
quence.  a  commonly  used  identification  input  generated 
by  randomly  and  independently  picking  each  value  to  be  1 
or  —  1.  has  the  desired  property'  of  containing  all  finite 
sequences  of  l’s  and  -  l’s.  with  probability  1. 

Using  the  above  result  on  the  infinite -horizon  diameter 
of  information,  we  shall  analyze  the  optimal  asymptotic  /, 
error  for  stable  model  sets. 

The  consistency  result  proved  earlier  applies  to  cr- 
compact  model  sets.  The  following  technical  lemma  con¬ 
cerning  the  asymptotic  /,  error  enables  us  to  extend  the 
result  to  model  sets  which  are  closure  of  c-compact 
model  sets  as  well. 

Lemma  4.5:  For  any  model  set  '22?.  inputs  u  e  Bfx, 
algorithm  6  and  5  >  0, 

eU<}>,  u,  'JJl,  5 )  <  lim  £.[(<£•«•  2ft-*) 
u  t 

where  21?  is  the  closure  of  2?  with  respect  to  the 
topology  on  J.  (The  superscript  “1”  emphasizes  that  the 
metric  used  is  the  /,  norm.) 

Proof:  By  definition,  for  all  x  >  0,  and  VA  e  '23?  and 
d  with  ||d||«  <  x,  we  have 

limsup  II  <t>(Pnu,  Pn(u  *  h  +  d))  -All,  <  ei(<t>,u,  2)?,  x). 

(4.13) 
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Let  e  >  0.  Take  anv  h  <=  i'i  and  iidIU  ^  6.  There  exists 
a  n  s  'JH  such  that  ij/i  -  A’iii  <  e.  Therefore 

limsup  ii  <b{Pnit.  P,(u  *  h  -(f))  -/ni.  (4.14) 

«  — *  x 

<  limsuD  ii  6(Pnu,  P.{u  *  h  u*(h  —  n  )  +  d)) 


-h' 111  -e. 


(4.15) 


Now.  ;iu  *(fc  -  *’)  +  <fL  <  6  +  €.  SO  applying  inequality 
(4.13)  with  x  =  6  +  e. 


limsup  il  d>(Pnu,  Pn(u  *  h'  +  u  *(h  h  )  +  d)) 

n  —  * 

— /Till  <  ei(d>.  2JZ.  u,  8  +  e).  (4.16) 


It  follows  that 

limsup  il  <MP„h.  F„(u  *  /i  +  4))  -  /till 

a  —  »  ! 

^  e*(c6, 3H.  u,  8  +  e)  +  e.  (4.17) 

Lenina  e  go  to  0  gives  the  desired  result.  □ 

We  now  show  that  we  can  get  very  good  asymptotic 
error  even  if  there  is  no  additional  prior  knowledge  about 
the  plant  other  than  the  fact  that  it  is  stable. 

Proposition  4.6:  Take  the  model  set  to  be  lv  the  space 
of  all  stable  giants.  There  is  a  single  experiment,  using  any 
input  u*  e  Bl^  containing  all  sequences  of  l’s  and  -Is, 
such  that  for  every  8  >  0,  the  optimal  asymptotic  lx  error 
satisfies 

EjOA/pS)  <  28. 

Proof:  The  space  f  is  separable,  i.e.,  it  is  a  closure  of 
a  countable  set  SU2X.  Since  a  countable  set  is  clearly 
cr-compact.  by  Theorem  3.4,  there  is  an  algorithm  <b*  such 
that  for  every  5  2:  0  and  inputs  u, 

ei(<6*.S?„u.«)  <D(u,'2Jl*fS).  (4.18) 

Now.  using  any  input  u*  containing  ail  sequences  of  1  s 
and  -  l’s,  we  have 

ei(<r,W,ab.“*'S) 

<  ti mei(4m,W.'Um.x)  by  Proposition  4.5 

X  i  fi 


convergence  to  a  small  asymptotic  error  is  possible,  such 
convergence  cannot  be  uniform. 

Proposition  4. 7:  Let  6  be  any  algorithm  and  u  be  any 
input.  Then  for  every  n  and  for  every  M,  there  exists  an 
h  e  such  that 

\\<b(P„u,Pn(u*h))  -  /ill,  >  M. 

Proof:  This  is  clear  because  making  n  measurements 
gives  no  information  on  the  pan  of  the  impulse  response 
after  time  n.  which  can  have  arbitrarily  large  uncenainty 

in  the  f,  norm.  ^ 

To  guarantee  uniform  convergence,  we  need  to  look  at 

compact  model  sets. 

Proposition  4.8:  Let  2K  c  272«ab  be  a  compact  set  (in 
the  irtopology)  or  a  subset  of  a  compact  set  in  Mstib.  For 
the  single  input  u*  which  contains  ail  finite  sequences  of 
l's  and  -  l’s,  there  is  an  algorithm  the  estimates  of  which 
converge,  uniformly  for  ail  h  £  HJI  and  all  1I</IU  ^  5,  to  an 
/,  ball  of  radius  28  around  the  true  plant.  Moreover,  the 
algorithm  does  not  require  the  knowledge  of  the  value  of 
8  to  compute  its  estimates. 

Common  examples  of  such  compact  model  sets  are  the 
uniformly  stable  ones,  of  the  form  M,(g )  =  (h:  \ht\  <  lg,l 
for  all  i)  where  g  is  any  stable  plant.  The  specific  model 
sets  considered  in  [8]  and  [9]  belong  to  this  class. 

Identification  Algorithms  for  Stable  Plants:  For  certain 
parameterizations  of  the  space  of  stable  plants,  it  is  possi¬ 
ble  to  device  algorithms  based  on  the  Occam's  Razor 
Principle  that  involve  linear  programming  problems.  De¬ 
fine  the  compact  sets: 

2Jit  =  {h  e  f\ :  \h,\  kM,  h,  =  0  Vi  ;>  k) 

and  M  is  any  positive  real  number.  It  can  be  immediately 
seen  that 

x 

/t  =  closure  of  |J  2)1*. 

*- 1 

Fix  some  tolerance  level  e.  The  estimator  can  be  de¬ 
scribed  as  picking  a  feasible  element  in  the  set 

2Rtn5„(2)?,u,y,5  +  e) 


£  lim£>(u*, x) 

xlS 

<2$  by  Theorem  4.4.  □ 

Hence,  to  identify  a  plant  accurately  in  the  limit,,  it  is 
enough  to  know  a  priori  that  it  is  stable;  no  additional 
information,  such  as  bounds  on  decay  rate  and  gain,  is 
necessary.  The  achievable  accuracy  varies  continuously 
with  the  noise  bound  5  for  small  5;  thus,  identification 
can  be  performed  robust  to  measurement  noise.  One 
should  also  note  that  there  are  many  other  choices  of 
decomposing  the  model  set  into  compact  sets.  The  decom¬ 
position  should  be  done  to  facilitate  a  more  efficient 
implementation  of  the  identification  algorithm.  We  will 
discuss  this  at  the  end  of  this  section. 

Next,  we  look  at  the  issue  of  uniform  convergence.  For 
the  model  set  lv  it  can  at  once  be  seen  that  although 


for  any  input— output  pair.  Of  course,  this  set  is  character¬ 
ized  by  linear  constraints  and  finding  a  feasible  plant  is 
equivalent  to  solving  a  linear  programming  problem.  The 
estimate  is  picked  from  the  smallest  272*  f°r  which  the 
above  set  is  not  empty. 

Suppose  that  the  model  set  is  equal  to  iSls{g)  where 
s  e  ix  and  g,  =  0  Vi  >  /.  This  set  contains  only  FIR 
plants  of  length  (,  with  a  bound  on  the  impulse  response. 
For  this  model  set  the  near-optimal  algorithm  <t>*  is  given 
by 

<£*(Fnu,F„y)  =  argmin  l!?,(y  -  u *  A)IL 

|AjlSlg,l.  i“0. 1.  —  I 

which  is  computable  by  linear  programming.  We  finally 
note  that  work  on  algorithms  is  still  an  active  area  of 
research  [34]. 
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B.  H ,  Identification  of  Stable  Rational  Plants 

We  now  analyze  optimal  identification  using  the  model 
set  /?//,,  space  of  all  stable  plants  with  rational  transfer 
functions.  The  error  metric  used  is  the  H _  norm.  The 
model  set  RH^  is  cr-compact  in  the  //,-topologv.  (For 
example,  it  can  be  decomposed  as  a  countable  union  of 
compact  sets  of  the  form  {h:  i/i,i  <  Aan]  with  A  tending 
to  infinity  and  a  tending  to  1.)  Convergence  in  //, 
implies  component-wise  convergence  of  the  impulse  re¬ 
sponse  in  each  of  these  sets.  Hence,  the  consistency  result 
applies  and  we  are  reduced  to  the  analysis  of  the 
infinite -horizon  diameter  of  information. 

Since  the  /f,-norm  of  a  plant  is  always  upper  bounded 
by  its  /,  norm.  Theorem  4.4  implies  that,  measured  in  the 
H „  norm,  the  infinite  horizon  diameter  of  information 
ZXu*,  RH*,  8 )  using  an  input  u’  containing  all  finite 
sequences  of  l’s  and  -  l’s  is  also  bounded  by  26.  Hence, 
the  worst-case  asymptotic  error  using  this  input  is  also 
bounded  by  25.  The  following  result  shows  that  this  input 
is  optimal  to  within  a  factor  of  two. 

Proposition  4.9:  For  any  number  of  experiments  N  and 
any  choice  of  inputs  u  e  Btx,  the  //,  infinite-horizon 
diameter  of  information  satisfies: 

D(u,RH^,8)  >  25. 

Proof:  The  proof  is  trivial.  Take  g  =  (6,0, 0,  ••• ).  h 
=  (-5,0,0,  ),</=-  5u,  d'  =  6u.  Then  u*  g  +  d  = 

u*h+d'  so  Diu.RH*,  5)  *  II*  -  Alin.  =  25.  □ 

A  similar  result  on  frequency  response  experiments  is 
given  by  [9]. 

C.  Identification  of  Unstable  Plants  in  the  Gap  Metric 

Our  general  framework  of  optimal  asymptotic  identifi¬ 
cation  applies,  to  a  large  extent,  to  unstable  as  well  as 
stable  systems.  In  particular,  the  consistency  and  uniform 
convergence  results,  for  arbitrary  inputs,  hold  regardless 
of  whether  the  model  set  contains  stable  or  unstable 
systems.  There  is.  however,  an  important  issue  in  the 
identification  of  unstable  systems  which  is  not  dealt  with 
in  this  framework.  While  stable  systems  can  be  identified 
in  the  open-loop,  identification  experiments  for  unstable 
systems  are  almost  always  performed  in  the  closed-loop  to 
avoid  unbounded  outputs.  As  opposed  to  open-loop  iden¬ 
tification,  there  is  no  complete  freedom  in  choosing  the 
inputs  u  for  closed-loop  identification  experiments,  as 
there  is  a  coupling  between  the  input  and  the  output.  This 
makes  the  experiment  design  problem  much  more  diffi¬ 
cult.  In  this  section,  we  shall  ignore  the  coupling  and 
confine  ourselves  to  deriving  necessary  and  sufficient  con¬ 
ditions  on  the  inputs  for  accurate  asymptotic  identifica¬ 
tion  of  unstable  systems.  The  question  of  whether  one  can 
design  closed-loop  experiments  to  achieve  such  conditions 
is  left  open. 

An  appropriate  error  metric  to  use  for  unstable  plants 
is  the  gap  metric  [6],  [33],  [42].  The  important  property  of 
the  gap  metric  is  that  it  generates  the  graph  topology  [40], 
which  is  the  weakest  topology  in  which  closed-loop  stabil¬ 


ity'  is  a  robust  property,  or  in  which  the  closed-loop  system 
varies  continuously  as  a  function  of  the  open-loop  system. 
Intuitively,  this  means  that  identifying  plants  accurately  in 
the  gap  metric  is  the  least  that  one  must  do  to  be  able  to 
design  controllers  to  guarantee  that  the  closed-loop  per¬ 
formance  wtil  be  ciose  to  the  desired. 

The  gap  between  two  possibly  unstable  plants  is  given 
in  terms  of  their  graphs,  so  we  will  first  define  this  notion. 
The  graph  Gh  of  a  plant  A  is  a  subset  of  the  space  L  x  /,, 
defined  by 

Gh  =  i(x,h  «  x):  x  e  /,,  h  *  x  s  /2}. 

Thus,  the  graph  of  a  plant  describes  its  behavior  on 
bounded-energy  inputs  which  yield  bounded-energy  out¬ 
puts.  The  directed  gap  between  two  graphs  Gh  and  Gg  is 
defined  as 

8(Gh,Gg )  =  sup  inf  ||x  -y||2. 
xsGk,iLli:sl  >eG; 

The  gap  between  two  plants  is  given  by  the  maximum  of 
the  two  directed  gaps  between  the  two  graphs: 

8(g,h)  =  max (8{Gg,Gk),  6(GA,G?)). 

It  can  be  verified  that  the  gap  is  indeed  a  metric,  and  that 
its  value  is  always  bounded  between  0  and  1. 

In  the  analysis  below,  we  shall  restrict  ourselves  to  the 
space  of  finite-dimensional  systems,  'Ulfd,  with  rational 
z-transform.!  In  this  space,  convergence  in  the  graph 
topology  can  be  expressed  in  terms  of  the  coprime  factors: 
Pi  -*  P  in  the  graph  topology  iff  there  exist  co-prime 
factorizations  Pt  =  ty/D,,  P  =  N/D  such  that  N 
and  Dl  -*  D  in  the  //x-topology.  Results  obtained  for 
finite-dimensional  plants  are  also  valid  for  infinite¬ 
dimensional  systems  that  can  be  approximated  by  finite- 
dimensional  systems  in  the  gap  metric. 

To  appiy  the  consistency  results  we  proved  earlier,  we 
have  to  investigate  the  topological  properties  of  Mfi. 

Proposition  4.10:  Let  p,  q  be  nonnegative  integers,  k,  a 
be  positive  real  numbers  and  p ,  q,K,  a)  be  the  class 

of  all  finite-dimensional  systems  having  z-transforms 

bpz”  +  -  4- b0 

Z*  +  aq- l*’"1  +  —  +a0 

with  bounded  parameters:  la, I  <  K  and  16,1  £  k  for  all  i, 
and  with  the  distance  between  any  pole-zero  pair  >  a. 
HRfd{p,  q,  K.  a)  is  compact  in  the  graph  topology,  and  on 
this  set  the  graph  topology  is  finer  than  the  product 
topology. 

Proof:  Let  [Pfz)}  be  a  sequence  of  plants  in 
sUlrd(p,q,  K,  a),  and  suppose  Pt  =  Nt/D,,  with  deg  Nt  <. 
p,  deg  Dt  =  q,  D,  monic,  and  the  coefficients  of  Nt  and  Dt 
bounded  by  K.  Clearly,  N,  and  D,  lie  in  sets  which  are 
compact  in  the  //,-topoiogy.  Hence,  there  exist  a  subse¬ 
quence  Nk  f  ->  /V*  and  Dk  -*  D*.  We  now  verify  that 
P*  =  N*/D* is  in  'Xflfd(p,q\  K ,  a).  We  first  note  that  //x 

1  In  this  paper,  the  2-transform  of  a  svstem  with  impulse  response  h  is 
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conversence  of  polynomials  of  bounded  degree  is  equiva¬ 
lent  to  convergence  of  their  coefficients.  Hence,  deg  N* 

<  p,  dec  D*  =  q,  D *  is  monic.  and  their  coefficients  are 
bounded  bv  K.  Moreover,  since  the  location  of  the  zeros 
of  a  polynomial  is  continuous  of  its  coefficients,  the  zeros 
of  S<,Dk  must  converge  to  those  of  .V*.  D*,  respec¬ 
tively.  and  the  separation  between  poles  and  zeros  is 
maintained  at  a  distance  of  at  least  a.  Hence.  P*  e 
2 l,d{p,q,K,al  and  Pk  -  P’  in  the  graph  topology. 
This  shows  that  K.a)  is  compact  in  the  graph 

topology.  Also,  in' 'Slfd(p.q.  K.a).  convergence  in  the 
graph  implies  convergence  in  the  coefficients  of  the  ratio¬ 
nal  transfer  function,  which  in  turn  implies  the  conver¬ 
gence  in  each  component  of  the  impulse  response.  This 
latter  fact  follows  by  inspection  of  the  inversion  formula 
for  z-transforms.  G 

It  is  clear  that  the  space  of  all  finite-dimensional  sys¬ 
tems  3Rfd  is  a  countable  union  of  sets  of  the  form 
'21  <d(p,q,  K.  a).  It  then  follows  that  Theorem  3.4  can  be 
applied  on  M,d  equipped  with  the  gap  metric,  and  the 
infinite -horizon  diameter  of  information  D,ip(u,HRfd,  8) 
characterizes  the  optimal  asymptotic  error  £,(u,  2fyj,  5). 

We  shall  first  derive  necessary  conditions  on  the  inputs 
u  for  the  robustness  of  the  asymptotic  error  to  measure¬ 
ment  noise,  i.e..  when  5)  approaches  0  as  6 

approaches  0.  This  is  in  terms  of  the  notion  of  stability 
testing:  inputs  u  s  [u°\  ulZ\— ,  u‘  V)]  are  said  to  be  able  to 
test  the  stability  of  plants  if  for  every  unstable  h  e  3Rfd  at 
least  one  of  the  inputs  uu)  yields  an  unbounded  output. 
We  have  the  following  result  on  the  loss  of  robustness 
when  the  inputs  are  not  rich  enough  to  test  stability. 

Proposition  4.11:  If  the  inputs  u  cannot  test  stability, 
then  Dgap(u,  3Rfd,  5)  -  1  for  all  5  >  0. 

Proof:  Let  8  >  0.  Consider  the  infinite-horizon  un¬ 
certainty  set  centered  at  the  origin: 

S.(%,.u, 0.  5)  =  [g  e  '2ft, d:  ilu  *  gL  <  5}. 

Since  u  cannot  test  the  stability  of  plants  in  Mfd,  there 
must  be  an  unstable  plant  h  <=.  'R;d  such  that  u*h  ls 
bounded:  by  appropriate  scaling,  we  can  assume  that 
h  e  Sj'JJlfd,u,Q.  8).  Since  the  zero  piant  is  also  in  this 
uncertainty  set  and  the  gap  distance  between  the  zero 
plant  and  any  unstable  piant  is  1  [6],  the  diameter  of  this 
uncertainty  set  must  be  1.  Hence,  the  diameter  of  infor¬ 
mation,  which  is  the  diameter  of  the  largest  uncertainty 

set.  is  also  I.  a 

We  now  give  explicit  necessary  and  sufficient  conditions 
for  inputs  to  be  able  to  test  stability.  We  begin  with  two 
definitions. 

Definition  4.12:  For  a  sequence  u  e  /„,  let  Z(u)  denote 
the  set  of  all  zeros  of  its  z-transform  U(z)  inside  the 
open-unit  disk.  (Note  that  L’(z)  is  analytic  inside  the 
open-unit  disk.) 

Definition  4.13:  A  sequence  u  is  said  to  excite  at  fre¬ 
quency  w  e  [0,2ir]  if 

I  " 

limsupi  Y,  uke~‘k 

n-*m  I  t  -0 


i.e..  the  Fourier  series  of  u  at  co  is  unbounded.  Let  fi(u) 
denote  the  set  of  all  frequences  at  which  u  excites. 

We  shall  now  give  the  following  result,  the  proof  of 
which  can  be  found  in  the  Appendix. 

Theorem  4.14:  HRfd  is  testable  for  stability  by  bounded 
inputs  u:n,--,u(-v>  if  and  only  if  the  inputs  have  the 
following  properties: 

i v 

1)  U  CL(uU))  =  {0,2ir] 

;-i 

N 

2)  f|  Z(a(,))  -  0. 

i- 1 

Hence,  the  inputs  can  test  the  stability  of  finite-dimen¬ 
sional  plants  if  and  only  if  they  excite  at  all  frequences 
and  have  no  common  zeros  in  the  unit  disk. 

We  have  the  following  corollary. 

Corollary  4. 15:  2 lfd  is  testable  for  stability  by  a  single 
input  u  e  B/x  if  and  only  if  u  excites  at  all  frequencies 
and  its  z-transform  has  no  zeros  inside  the  open-unit  disk. 

Neither  the  existence  nor  the  nonexistence  of  a  bounded 
input  having  both  the  properties  required  by  Corollary 
4.15  has  been  established.  However,  bounded  inputs  which 
excite  at  all  frequencies  do  exist.  In  fact,  Lusin  [19]  has 
constructed  a  sequence  which  excites  at  all  frequencies 
despite  the  fact  that  the  sequence  actually  tends  to  0. 

Stability  testing  is  a  necessary  property  the  inputs  must 
satisfy  in  order  to  have  robustness  in  the  asymptotic  error. 

It  will  now  be  shown  that  stability  testing  combined  with 
the  property  of  containing  all  finite  sequences  of  l’s  and 

-  l’s  are  in  fact  sufficient  to  guarantee  robustness. 
Theorem  4.16:  If  the  inputs  u  can  test  stability  and  at 

least  one  of  them  contains  all  finite  sequences  of  l’s  and 

-  1’s  then  for  ail  5  >  0, 

DftpiuCUlfj,  5)  <  25. 

Proof:  Consider  now  the  infinite-horizon  uncertainty 
set  SJMfd,u,Q,  5)  centered  at  the  origin.  Since  all  the 
plants  in  this  set  give  zero  output  on  the  inputs  and  the 
inputs  test  stability,  all  the  plants  in  this  set  must  be 
stable.  Moreover,  one  of  the  inputs  contains  all  finite 
sequences  of  l’s  and  -l’s.  We  are  now  in  a  similar 
situation  as  in  Theorem  4.4,  which  applies  to  the  stable 
plant  case.  Exact  arguments  as  in  the  proof  of  that  theo¬ 
rem  show  that  the  diameter  of  this  uncertainty  set  mea¬ 
sured  in  the  £  norm  is  bounded  by  25.  Since  2 ljd  is 
balanced  and  convex,  the  diameter  of  information  equals 
diameter  of  this  set  (measured  in  the  f  norm).  Finally,  by 
a  result  proved  in  the  Appendix,  the  gap  distance  between 
two  plants  is  always  bounded  by  the  /£,  distance,  and 
therefore  also  by  the  /j  distance.  Hence,  the  diameter  of 
information  Dgap(u,  3Rfd,  8)  measured  in  the  gap  metric 
is  bounded  by  the  diameter  of  information  measured  in 
the  l ,  norm,  and  hence  also  bounded  by  2  5.  □ 

We  will  now  exhibit  two  inputs  which  have  the  above 
desired  properties.  First,  it  will  be  demonstrated  that  any 
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input  that  contains  all  finite  sequences  of  l's  and  -  Vs 
excites  at  ail  frequencies. 

Proposition  4.17:  Let  u  be  any  sequence  which  contains 
all  finite  sequences  of  l’s  and  -  i's.  Then  fMw)  =  [0,2tt]. 

Proof:  Let  oj0  be  an  arbitrary  frequency  in  [0, 2rr  ]. 
Take  any  M  >  0.  The  sum  E,  cos  kio« i  is  divergent,  so  we 
can  find  an  integer  L  such  that  EL0,cos  A:  oj0  i  >  M.  By 
the  definition  of  the  sequence  u.  there  exists  an  integer  n , 
such  that 

(  ^  n , ,  *  ^  n .  ♦  1  >  •  ^  n  j  ♦  L  ^ 

=  (1,  sgn  (cos  aj0),sgn(cos2ai0),---.sgn(cos  Lw0)). 

(4.19) 


Now. 

n,  +  L 

Y  sgn  tcos  (k  -  n, 

k-nl 


ntfL 

Y  uke~,k‘"' 

kmn , 

= 

L 

£  sgn  (cos  ka >«)e~,kL"' 
k-0 


L 

Y  sgn  (cos  k(ja)  cos  Aw0 
*-0 


>  M. 


This  is  true  for  every  M ,  so  limsupn_..  |EJ_0  uke~,l“",\  - 

X.  □ 

Using  two  inputs,  one  of  which  contains  all  finite  se¬ 
quences  of  l’s  and  -  l’s  and  the  another  the  unit  impulse, 
will  suffice  to  test  stability,  since  the  former  excites  at  all 
frequencies  and  the  latter’s  r-transform  has  no  zeros  in 
the  unit  disk.  It  follows  immediately  from  the  Theorem 
4.16  that  an  optimal  worst-case  gap  error  of  25  can  be 
achieved  with  these  two  inputs. 

This  result  shows  that  for  finite-dimensional  plants, 
identification  in  the  gap  metric  can  be  performed  robust 
to  the  noise  level  5,  i.e.,  as  6  goes  to  zero,  the  identifica¬ 
tion  error  also  goes  to  zero.  However,  we  have  not  yet 
shown  that  the  two  experiments  are  optimal  or  near 
optimal.  A  lower  bound  to  the  optimal  asymptotic  gap 
error  using  any  bounded  inputs  will  now  be  derived.  This 
will  show  that  for  small  5,  the  above  experiment  design  is 
no  more  than  a  factor  of  two  from  optimality.  _ 

Proposition  4.18:  For  any  N  and  inputs  u  £  5fx,  the 
optimal  worst-case  asymptotic  gap  error  for  finite-dimen¬ 
sional  plants  satisfies 


E}(u,HJlfd,  5)  >  , 

Vl  +  52 


Proof:  To  prove  this  resuit,  it  suffices  to  show  that 
the  infinite-horizon  gap  diameter  of  information  satisfies 

Dm{u,mfd,b)>ir  8  . 

7  vi  +  s2 


We  make  use  of  the  following  lower  bound  for  the  gap 
metric  [44]: 


5(h.O)  > 


IWk 

V  i  +  iwk 


Now, 

D?ap(«.2Kv,5) 

=  sup  sup  diam.3p  SJ.3R,d,u,h  *  u  +  d,  8) 
hesifiy\\.^s 

>  diam3ip  ,Sx(2ftv,  u.0,  5) 

=  sup  25(g,0) 

geffi,,,  •  iiiLsS 

since  5(g,0)  =  6(  -g,0) 

„  llslk 

>  SUp  2  -y 

geilt^.ug  •ali.sS  y  1  +  llgllw, 

using  the  lower  bound  to  the  gap 
25 

>  ; 

VI  +  5 2 

choosing  g  to  be  an  impulse  with  magnitude  5. 

□ 


Finally,  we  note  that  this  theorem  has  interesting  impli¬ 
cations  to  identification  in  the  closed  loop.  To  accurately 
estimate  the  plant,  it  is  necessary  that  the  input  satisfies 
the  conditions  in  Theorem  4.14.  In  general  it  is  not  known 
whether  there  exists  one  input  with  that  property.  If  not, 
then  more  information  about  the  model  set  should  be 
known.  An  example  of  such  information  is  the  knowledge 
of  a  stabilizing  controller  of  the  plant  to  be  identified. 
Details  on  this  can  be  found  in  [29],  [39]. 

V.  Conclusions 

In  this  paper,  we  have  approached  the  problem  of 
analyzing  the  intrinsic  limitations  of  identification  by  con¬ 
sidering  the  optimal  worst-case  asymptotic  error  achiev¬ 
able  using  any  input  and  any  identification  algorithm.  This 
gives  an  intrinsic  measure  of  the  difficulty  of  identifica¬ 
tion,  given  the  a  priori  knowledge  (model  set  and  distur¬ 
bance  class)  and  the  constraints  on  the  allowable  experi¬ 
ments  (input  class). 

The  analysis  is  performed  in  two  steps.  First,  for  fixed 
inputs,  a  lower  bound  on  the  error  of  any  identification 
algorithm  is  expressed  in  terms  of  the  diameter  of  the 
worst-case  infinite-horizon  uncertainty  set,  and  it  was 
shown  that  under  some  compactness  conditions  on  the 
model  set,  there  exist  algorithms  which  achieve  to  within  a 
factor  of  two  of  this  bound  asymptotically.  These  results 
hold  for  any  error  metric  and  disturbance  norm.  Second, 
for  specific  identification  problems,  characterization  of 
inputs  which  makes  this  infinite-horizon  diameter  of  infor¬ 
mation  small  is  given.  In  particular,  we  considered  identi¬ 
fication  in  both  the  /j  and  the  //,  norms  for  stable  plants, 
and  in  the  gap  metric  for  unstable  finite-dimensional 
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plants  of  arbitrary  order.  The  significance  of  these  error 
metrics  is  that  if  the  worst-case  error  is  smail  in  these 
metrics,  methods  exist  for  synthesizing  controllers  to 
achieve  robust  performance  [2],  [4], 

The  results  show  that  accurate  identification  is  possible 
in  the  worst  case  for  a  specific  choice  of  inputs  depending 
on  the  model  set.  For  identification  in  the  lx  norm, 
algorithms  for  computing  estimates  are  based  on  linear 
programming  and  are  easily  impiementable.  For  the  iden¬ 
tification  in  the  gap  metric,  robust  identification  was  shown 
to  be  more  or  less  equivalent  to  stability  testing.  This  has 
important  implications  on  closed-loop  identification  in 
which  one  does  not  have  direct  access  to  the  input. 

There  are  many  issues  in  worst-case  identification  that 
need  to  be  resolved.  The  issue  of  computational  complex¬ 
ity  and  implementation  of  the  algorithm  is  a  central  issue. 
In  particular,  it  is  beneficial  to  relate  the  complexity  of 
the  model  set  to  the  complexity  of  the  required  experi¬ 
ments  and  the  algorithms.  Another  issue  is  the  relation¬ 
ship  between  the  identification  in  the  frequency  domain 
and  the  time  domain,  particularly  as  it  relates  to  algo¬ 
rithms  and  complexity.  Deeper  study  of  identification  of 
unstable  plants  in  a  closed-loop  setting  is  needed.  The 
relations  of  all  of  this  to  adaptive  control  is  of  course  one 
of  the  prime  motivations  for  this  work  and  will  be  the 
subject  of  future  research. 


one  of  the  inputs,  say  u:,>.  has  no  zero  at  z  =  z>.  Hence, 
the  output  vl,)  must  have  a  pole  at  z  =  z x,  and  therefore 
cannot  be  bounded. 

Thus.  H  can  only  have  poles  on  or  outside  of  the  unit 
circle.  Write 

H(z)  =  Hu(z)  +  Hs(z)  (A-20) 


where  Hs(z)  contains  the  stable  poles  (outside  the  unit 
circle)  and  the  finite  impulse  response  (FIR)  part  of  H(z), 
and  Hu(z)  is  strictly  proper  with  all  poles  on  the  unit 
circle.  Let  hu  and  hs  be  the  inverse  transforms  of  H„  and 
H„  respectively.  Since  the  output  u*hs  corresponding  to 
the  stable  part  must  be  bounded,  one  needs  only  to  verify 
that  the  boundedness  of  u(i)  *  hu  for  every  i  implies 
hu  =  0. 

Suppose  that  Hu  is  not  identically  0  and  has  L  >  0 
poles  (counting  multiplicities)  on  the  unit  circle  at  distinct 
frequencies  (ox,  %.  Then  Hu(z)  can  be  decom¬ 
posed  as 


Hu(z) 


M 

L  Viz) 


i-l 


(A.21) 


where 


Wz)  = 


(z  - 


(A.22) 
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Appendix 

A.  Proof  of  Theorem  4.14 


To  prove  this  result,  we  need  the  following  lemma,  the 
proof  of  which  is  elementary  but  tedious,  and  can  be 

found  in  [38].  _ 

Lemma  A.l:  Let  u  e  B/x  and  let  h  be  a  complex¬ 
valued  impulse  response  (i.e.,  the  sequence  values  can  be 
complex)  with  a  strictly  proper  rational  transfer  function 


H(z)  = 


If! o' 

(z  -  e'“)* 


(It  has  a  single  pole  repeated  M  times  at  e;“.)  Then: 

1)  If  u  excites  at  frequency  oj,  the  output  u  *  h  is 
unbounded. 

2)  If  u  does  not  excite  at  co  and  M  =  1  (the  poie  is 
simple),  the  output  u  *  h  is  bounded. 

Armed  with  this  lemma,  we  can  now  prove  Theorem 


4.14. 

Proof: 

(if  part)  _ 

Let  u(l),  u(2V",  uiN)  s  Blx  be  N  inputs  satisfying 
properties  (1)  and  (2).  Let  h  e  Wlfi  with  a  rational  z- 
transform  H(z),  and  assume  that  the  outputs 
i  »  I,"-,  N,  are  all  bounded.  We  shall  show  that  h  must 

be  stable. 

Suppose  that  H(z)  has  a  pole  z  =  zt  inside  the  open- 
unit  disk.  Since  the  inputs  have  no  common  zeros,  then 


and  is  the  order  of  the  pole  at  z  =  eJU>. 

Consider  a  minimal  state  space  realization  of  the  sys¬ 
tem  with  transfer  function  H.J.z),  where  the  states  x 
consist  of  the  modes  corresponding  to  each  pole  of  the 
system.  The  dimension  of  the  realization  is  L  and  some  of 
the  states  are  complex  but  they  occur  in  conjugate  pairs. 
(These  correspond  to  conjugate  poles.)  Since  U  fi  (u< ') 
=  [0, 2 7T ]  the  frequency  lies  in  fl(u)  for  some  input 
v  e  u(N)).  By  Proposition  A.1, 

y(b  =  u  .  €  /„  (A23) 


where  hw  is  the  impulse  response  whose  z-transform  is 

tf,(z).  .  f  , 

If  x(1)  are  the  modal  states  (of  dimension  L,)  corre¬ 
sponding  to  this  poie  at  the  system  hw  can  be 
realized  minimally  as 


_(1)  -  A  r(D  X  B.v 

1  ~  A\Xn  vn» 


yil)  =  C\xn)  (A24) 


for  some  matrices  AX,BX,C\- 
Since  y(1)  is  imbounded  but  v  is  bounded,  it  follows 
from  (A.24)  that  the  modal  states  x(1)  must  be  unbounded 
given  input  v.  But  the  overall  state  x  for  the  entire  system 
Hu(z )  is  an  aggregation  of  the  modal  states  and  hence 
must  become  unbounded  too  when  input  v  is  applied.  The 
last  step  is  to  show  that  this  implies  that  the  output  of  the 
overall  system  must  be  unbounded  also. 

Let  the  minimal  state  space  realization  of  Hu  be 

1  -  Ax,  +  Bvn,  y„  -  Cr„.  (A25) 
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From  iA.25).  a  sequence  of  equations  is  obtained  as 
>'*  =  Cxn 

=  CAXn  ~  CBU, 


Now  consider  the  unstable  finite-dimensional  system 
h,  =  r^n  cos i/udJ.  For  each  i.n. 


I  n 

!( uU)  *  h)n\  ='  Y  cosln  -  *)o>0 

!  fc-o 


IC4^. 


Let 


V, 

c 

y*  - 

y„,i 

,  Qo(A.C)  = 

CA 

yn+L- 1 

_CAl~\ 

0 

CB 


If-V  C4'fi 


The  sequence  of  output  equations  can  then  be  written 
as 


y .  *=  Q0(AX)x„  *  £v„.  (A.26) 

Note  that  Q0(,4,C)  is  the  observability  matrix  of  the 
system  by  the  minimality  of  the  realization.  Q0(A,C)  is 
invertible.  Since  x„  becomes  unbounded  and  v„  is 
bounded,  the  output  y„  must  be  also  unbounded.  This 
contradicts  our  original  assumption  and  hence  Hu  =  0. 
The  original  system  h  must  be  stable  and  the  inputs 
can  test  stability  in  31  fd. 

(onlv-if  part) 

We  now  show  that  the  two  conditions  for  the  inputs  are 
also  necessary  to  test  the  stability'  in  272 fd. 

Suppose  the  first  condition  is  not  satisfied:  consider  an 
w0  e  (0, 2tt]  but  «0  £  LI,*.)  n(u(,)).  Consider  the  unsta¬ 
ble  system  hn  =  cos(/jw0).  Lemma  A.l(b)  implies  that 
y(0  *  ginun  is  bounded  for  all  i.  Since  uu)  *  h  is  the  real 
pan  of  u{i)  *  e1""*,  it  is  also  bounded  for  all  /.  Thus,  the 
inputs  cannot  test  stability  in  3lfd.  This  shows  that  the 
first  condition  is  necessary. 

Now  suppose  that  the  second  condition  is  not  satisfied, 
so  that  there  exists  some  z0  =  r0e,u”  (0  <  r0  <  1)  which  is 
a  common  zero  in  the  open-unit  disk  of  the  z-transforms 
of  all  the  inputs:  that  is. 


£  =  0,  Vi.  (A.27) 

k-0 

Since  the  inputs  are  real,  their  zeros  occur  as  conjugate 
pairs.  i.e„ 


1  n 

£  uJt'ro ( e"t'1  _ * -1- 
k-0 

I  n  \ 

ejnu,nl  £ 

\k- o 


e £  u^ejku 
l  k- 0 


Thus  the  output  for  each  of  the  inputs  is  bounded.  Hence, 
the  inputs  u(,)’s  cannot  test  the  stability  in  272 {d.  □ 


B.  An  Inequality  Between  the  Gap  and  H„ 
Distances 

Proposition  B.l:  Let  h  and  g  be  two  plants.  Then 
8(g,h)  <  \\h  -gil«.. 

Proof:  We  assume  that  \\h  -  g||w„  <  «  otherwise 
there  is  nothing  to  prove.  Now, 

5(g, h)  =  max  [5(G^,GJ,  6(GMG?)) 

where 

Gh  e  {{u,h  *  u)  e/;:xe/;./:*x€/;} 

and 

8(Gh,Gs)  =  sup  inf  llx  -y||2. 

j€CA,tU!l2iI  1  € 

Now.  since  l!g  -  Tillw.  <  x 

(u,h*  u)  e  Gh  »  («, g  *  u)  e  Gf. 

We  have 


8(Gh,Gs)  <  sup  inf  IKu./i  *  «)  -  ylU 

h  •  uel2,  llulljs  I  ysG; 

<,  sup  II(m./i  *  u)  —  (u,  g  *  u)IU 

h  •  u e/2,  H12II2S  1 

=  sup  Kh  - g)*  mII2 

h  •  «€/;.  llull’S  1 

£  \\h  -gh.- 

Hence,  the  result  follows.  □ 


Y  =  0  Vi. 

Jt-0 


(A.28) 
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Minimization  of  the  Maximum  Peak-to-Peak 
Gain:  The  General  Multiblock  Problem 

Ignacio  J.  Diaz-Bobillo  and  Munther  A.  Dahleh 


Abstract — This  paper  presents  a  comprehensive  study  of  the 
general  /, -optimal  multiblock  problem,  as  well  as  a  new  linear 
programming  algorithm  for  computing  suboptimal  controllers. 
By  formulating  the  interpolation  conditions  in  a  concise  and 
natural  way,  the  general  theory  is  developed  in  simpler  tenns 
and  with  a  minimum  number  of  assumptions.  In  addition, 
further  insight  is  gained  on  the  structure  of  the  optimal  solu¬ 
tion,  and  different  classes  of  multiblock  problems  are  distin¬ 
guished.  This  leads  to  conceptually  attractive,  iterative  method 
for  finding  approximate  solutions  with  the  following  properties: 

1)  approximates  multiblock  problems  with  one-block  problems 
by  delay  augmentation,  2)  unifies  the  treatment  of  zero  and  rank 
interpolation  conditions  through  robust  computations,  3)  pro¬ 
vides  upper  and  lower  bounds  of  the  optimal  objective  function 
by  solving  one  finite  dimensional  linear  program  at  each  itera¬ 
tion,  4)  for  a  class  of  problems,  it  generates  suboptimal  con¬ 
trollers  that  achieve  the  upper  bound  without  order  inflation,  5) 
both  bounds  as  well  as  the  solution  converge  to  the  optimal,  6)  it 
does  not  require  the  existence  of  polynomial  feasible  solutions, 
and  7)  gives  information  about  the  support  structure  of  the 
optimal  solution. 

Notation 

Let  X  be  a  real  normed  vector  space,  then  X*  denotes 
the  dual  space  of  X  containing  all  bounded  linear  func¬ 
tionals  on  X. 

/  Space  of  absolutely  summable  sequences  sup¬ 
ported  on  the  nonnegative  integers.  If  x  e  lx 

then  IWIi  =  <  ao- 

Space  of  p  X  q  matrices  with  entries  in  lv  If 

M  =  (m  ij)  e  /f*9,  then  II  A/  II i  := 

maxls,-s,,E?-1IK7lli. 

/„  Space  of  all  bounded  sequences  of  real  numbers 
supported  on  the  nonnegative  integers.  If  x  s  t 

then  IUIU  :=  sup*U(k)l  <  “• 

/p*<?  Space  of  p  X  q  matrices  with  entries  in  If 
M  =  (m,-.)  e  l£*q,  then  II Mil.  :=  Ef-i 
maXj zjzj\mj 7I|..  Note  that  l£xq  =  (/fx<0*. 

c^i  Subspace  of  l£xq  consisting  of  all  elements 
whose  entries  decay  to  zero,  i.e.,  lim<:_„m)7(A:) 
=  0  for  all  {ij).  Note  that  (c$Xq)*  =  lfxq. 

A  Complex  variable  representing  the  unit  delay. 
Given  M  e  /fx?,  define  M( A)  ==  YTk.0M{k) A 
as  the  A-transform  of  M. 
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cfi  The  open  unit  disk. 

pk  The  truncation  operator  on  sequences.  Hence,  if 
x  =  {x(i)}^0  is  any  sequence,  then  Pkx  = 

U(o),.td),-,^a),o,-}. 

Sk  Right  shift  by  k  positions.  If  x  =  {x(i)}^Q  is  any 
sequence  and  k  is  a  nonnegative  integer,  then 

Skx  =  {0,--,0,  x(0),  x(l),  •••}• 

Given  a  matrix  M,  (M);  will  denote  its  ith  row  and  {MV 
its  ;th  column. 


I.  Introduction 

DESIGN  specifications  for  practical  control  problems 
are  often  most  naturally  expressed  in  terms  of 
time-domain  bounds  on  the  amplitude  of  signals  (exoge¬ 
nous  disturbances  and  regulated  outputs).  This  observa¬ 
tion  has  led  to  the  introduction  of  a  new  optimization 
problem  in  the  context  of  control  system  design.  In  [37] 
Vidyasagar  formulated  the  /, -optimal  control  problem.  In 
contrast  with  the  K  problem,  the  /r optimal  design  has  as 
objective  the  minimization  of  the  maximum  peak-to-peak 
gain  of  a  closed-loop  system  that  is  driven  by  bounded 
amplitude  disturbances. 

From  1987-1988,  Dahleh  and  Pearson  introduced  some 
basic  results  on  the  theory  of  1 1  optimization.  In  [9]  the 
solution  to  the  lx -optimal  control  problem  was  presented 
for  the  special  case  of  square  (i.e.,  one-block)  systems. 
Then,  in  [11]  Dahleh  et  al.  presented  the  central  ideas  for 
the  solution  of  nonsquare  (i.e.,  multiblock)  problems,  in¬ 
cluding  a  method  to  compute  approximate  suboptimal 
solutions  iteratively.  Such  method  is  based  on  the  solution 
of  a  linear  program  representing  a  truncated  version  of 
the  original  problem.  Similar  results  extending  these  ideas 
to  the  continuous-time  domain  were  introduced  by  the 
same  authors  in  [10],  as  well  as  a  solution  to  the  fixed 

input  optimization  problem  [12]. 

These  results  brought  considerable  attention  to  the 
problem  of  lx  optimization.  In  [29]  a  general  treatment  of 
the  multiblock  case  was  presented,  where  the  optimal 
solution  is  shown  to  exits  under  some  assumptions.  Inde¬ 
pendently  in  [6]  and  [33]  a  method  was  introduced  to 
compute  lower  bounds  on  the  optimal  norm,  by  solving  a 
complementary  linear  program.  A  direct  linear  program¬ 
ming  formulation  (in  the  primal  space)  was  presented  m 
[30].  Also,  [34]  introduced  a  nice  account  of  some  conver¬ 
gence  properties  and  pointed  to  interesting  deficiencies  in 
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the  theory.  In  [17],  [18]  the  full  state-feedback  problem 
was  addressed. 

On  the  area  of  robustness,  considerable  advancement 
was  made  too.  In  [13],  the  necessity  of  the  small  gain 
theorem  in  the  /,  context  was  analyzed.  Also,  [24]  pre¬ 
sented  necessary  and  sufficient  conditions  for  robust  per¬ 
formance  and  robust  stability  under  structured  time-vary¬ 
ing  perturbations.  It  turns  out  that  such  conditions  are 
relatively  easy  to  compute  making  the  theory  more  attrac¬ 
tive  from  the  point  of  view  of  applications.  Other  related 
work  can  be  found  in  [8],  [6],  [3],  [19].  [14],  [32], 

The  present  investigation  is  motivated  by  the  lack  of  a 
solid  understanding  of  the  general  /,  multiblock  problem. 
While  various  aspects  of  the  theory  are  well  understood, 
the  structure  of  the  optimal  solution  in  the  general  multi¬ 
block  case  is  not.  As  a  result,  solution  methods  which  are 
based  on  a  straightforward  truncation  of  the  full  problem, 
suffer  from  significant  deficiencies.  Most  important,  they 
generate  a  sequence  of  suboptimal  controllers  of  increas¬ 
ing  order,  and  miss  the  structure  of  the  (possibly  low 
order)  optimal  controller.  This  issue  was  pointed  out  quite 
nicely  in  [33]  where  exact  solutions  of  low  order  were 
computed  for  some  example.  From  a  practical  point  of 
view,  such  truncation  method  translates  into  high  order 
controllers  even  for  the  simplest  multiblock  problems.  At 
the  same  time,  it  requires  the  existence  of  feasible 
closed-loop  maps  with  finite  pulse  response,  a  condition 
that  many  control  problems  lack. 

In  this  paper  we  present  a  comprehensive  treatment  of 
the  general  /[-optimal  multiblock  problem.  Contributions 
are  made  in  the  general  theory  as  well  as  in  the  approxi¬ 
mate  methods  of  solution.  With  regard  to  the  problem 
formulation,  a  more  compact  and  natural  way  of  charac¬ 
terizing  the  interpolation  conditions  of  the  general  multi¬ 
block  problem  is  presented.  It  has  the  advantage  of  sim¬ 
plifying  many  of  the  proofs  and  avoiding  unnecessary 
assumptions  (compared  to  previous  work  [29],  [34]).  We 
also  present  a  new  solution  method  for  the  general  multi¬ 
block  problem  with  the  following  characteristics: 

1)  Approximates  multiblock  problems  with  one-block 
problems  by  delay  augmentation,  thus  exploiting  the  char¬ 
acteristics  of  the  optimal  solutions  of  such  problems. 

2)  Applies  results  from  matrix  theory  [21]  in  the  com¬ 
putation  of  interpolation  conditions. 

3)  With  each  approximation  (requiring  the  solution  of 
only  one  linear  program),  the  method  provides  upper  and 
lower  bounds  of  the  optimal  norm. 

4)  Under  mild  assumptions,  both  bounds  converge  to 
the  optimal  value  of  the  norm. 

5)  With  each  approximation  the  method  generates  a 
feasible  (i.e.,  stabilizing)  controller  that  achieves  the  up¬ 
per  bound. 

6)  For  a  special  class  of  multiblock  problems  the  solu¬ 
tions  are  exact. 

7)  For  a  larger  class  of  multiblock  problems  the  se¬ 
quence  of  suboptimal  controllers  does  not  suffer  from 
order  inflation. 


Also,  a  result  is  presented  relating  the  support  charac¬ 
teristics  of  the  optimal  and  approximate  solution  of  multi¬ 
block  problems,  followed  by  a  stronger  conjecture.  These 
results  are  complemented  by  a  broad  range  of  numerical 
examples,  including  a  case  study  where  the  /,  and  ^ 
solution  to  the  pitch  axis  control  of  the  X29  aircraft  are 
compared. 

The  paper  is  organized  as  follows:  in  Section  II  the 
general  /[-optimal  control  problem  is  defined.  The  new 
interpolation  conditions  are  presented  in  Section  III  as 
well  as  computational  procedures.  This  is  followed  by  an 
existence  result  with  minimum  assumptions  in  Section  IV. 
Next,  we  establish  the  equivalence  between  /,  optimiza¬ 
tion  and  infinite  dimensional  linear  programming  in  Sec¬ 
tion  V.  Section  VI  contains  the  solution  to  one-block 
problems.  The  results  in  this  section  are  an  extension  of 
those  in  [29].  Section  VII  presents  (approximate)  methods 
of  solution  to  multiblock  problems.  In  particular,  the 
delay  augmentation  method  is  introduced  along  with  its 
convergence  properties.  Illustrations  and  examples  are 
contained  in  Section  VIII.  In  Sections  IX  and  X,  we 
present  a  few  results  and  observations  (including  a  conjec¬ 
ture)  on  the  support  characteristics  of  these  approximate 
solutions.  Finally,  we  treat  the  X29  synthesis  problem  in 
Section  XI  followed  by  the  conclusions  in  Section  XII. 

II.  Problem  Formulation 

The  setup  corresponds  to  the  standard  disturbance  re¬ 
jection  problem  formulated  as  a  linear  fractional  transfor¬ 
mation  from  the  disturbance  input  to  the  regulated  out¬ 
put,  with  the  controller  in  the  lower  loop  (see  Fig.  1).  In 
particular,  we  consider  the  discrete  time  case,  with  the 
inputs  and  outputs  being  sequences  of  vectors.  The  prob¬ 
lem  is  represented  via  an  LTI  finite-dimensional  operator, 
G,  that  maps  the  disturbance  vector  w  of  dimension  nw, 
and  the  control  vector  u  of  dimension  nu,  to  the  regu¬ 
lated  output  vector  z  of  dimension  n.,  and  the  measure¬ 
ment  vector  y  of  dimension  ny.  Thus,  with  the  appropri¬ 
ate  partitioning, 


The  controller  action  is  represented  by  the  operator  K 
that  maps  the  measurement  sequence  to  the  control  se¬ 
quence,  i.e.,  u  =  Ky.  The  closed-loop  map  from  the  distur¬ 
bance  to  the  regulated  output,  denoted  4>,  is  given  by: 

*  =  G„  +  GaKU-  G22KV'G2v  (2) 

The  /[-optimal  control  problem  can  be  stated  as  follows: 
among  all  internally  stabilizing  controllers,  find  the  one 
that  minimizes  the  maximum  peak-to-peak  gain  of  <I> 
operating  on  the  space  of  bounded  disturbances  with  unit 
norm.  That  is, 

ix°  ■-  inf  sup  (  max  ||(<I>w')*|U ) 

*stab  m«,SIS„J|Wl|U=U  ' 

=  inf  ll^lli.  (3) 

K  stab 
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Fig.  1.  The  standard  problem. 


In  the  above  we  have  used  the  fact  that  the  induced  norm 
of  an  operator  mapping  bounded  sequences  in  Un~  to 
bounded  sequences  in  is  given  by  the  norm. 

It  is  well  known  that  a  simpler  description  of  the  set  of 
all  (internally)  stable  closed-loop  maps  is  obtained  via  a 
parameterization  of  all  stabilizing  controllers  [38].  Such 
parameterization  provides  an  affine  expression,  mapping 
an  operator  space  to  the  set  of  all  internally  stable 
closed-loop  maps: 

<D  =  H  -  UQV  (4) 

where  H  £  ZJ*x"-f  U  £  /?■*"■  and  V  £  /?>x"-  are  func¬ 
tions  of  the  problem  data  (i.e.,  the  operator  G),  and  Q  is 
a  free  parameter  in  ZJ-X">  (i.e.,  stable).  Furthermore,  if  G 
is  LTI  and  finite  dimensional,  so  are  H,  U,  and  V.  Then, 
for  any  Q  £  /f*x"',  a  controller  can  be  computed  that 
achieves  the  corresponding  closed-loop  map,  $. 

Consequently,  the  Zt  problem  can  be  redefined  as  a 
minimum  distance  problem  in 

ul°  ==  inf  || H  -  i?lli  =  inf  ll<Mi  (5) 

^  ReS* 

where 


following  alignment  conditions:  _ 

i)  if  |gl7(f)l  <  maxls/s„Jlg,7IU  then  <Z>,/0  -  0, 

Si)  trt  /  =V  e  [1,2,-, =  0},  then  ||(<I>0),lli 

=  a°  for  all  i  not  in  /, 

iv)  for  all  i  £  I,  (<t>°),  can  be  anything  such  that 

l|4>°),||i  <  \i°. 

The  next  section  studies  the  solvability  of  the  equation 


III.  Interpolation  Conditions 

Here  we  take  some  of  the  ideas  in  [11]  and  [29],  and 
present  a  natural  and  compact  description  of  the  interpo¬ 
lation  conditions  for  the  most  general  MIMO  case. 

The  notion  of  interpolation  conditions  can  be  viewed  in 
at  least  two  ways:  as  algebraic  conditions  on^  the  matrix 
R(  A)  so  that  it  belongs  to  the  range  of  UQV,  or  as 
conditions  on  the  nullspace  of  the  operator  R.  Here  we 
are  going  to  exploit  the  algebraic  notion  although,  for  the 
purpose  of  computations,  we  view  the  interpolation  condi¬ 
tions  as  a  nullspace  matching  problem. 

In  the  sequel  it  will  be  assumed,  without  loss  of  general¬ 
ity,  that  U(  A)  has  full  column  rank  (i.e.,  rank  of  nu  for 
almost  ail  A)  and  K(A)  has  full  row  rank  (i.e.,  rank  of  ny 
for  almost  all  A).  Violation  of  these  assumptions  implies 
that  there  are  redundancies  in  the  controls  and/or  the 
measurements  which  can  be  easily  removed. 

First,  a  simple  but  useful  result  from  complex  variable 
theory  is  presented,  where  (-)(A:>(A0)  denotes  the  k  th  order 
derivative  with  respect  to  A,  evaluated  at  A0. 

Lemma  3.1:  Given  a  function  /(•)  of  the  complex  van- 
able  A  analytic  in  then  (/)<<:)(A0)  =  0  for  k  =  0, 1»'“» 
(cr  -  1)  for  A0  £ Sf  if  and  only  if  /(A)  =  (A  -  A0)  g( A) 

where  g(-)  is  analytic  in  SL 

Next  consider  Smith-McMillan  decompositions  of  the 
rational  matrices  U  and  V.  (Note:  to  simplify  notation,  the 
complex  variable  argument  will  be  omitted  in  most  ex¬ 


[r  e  /"--*'Iw|/?  =  UQV  for  some  Q  e  Z^Xn^}.  (6) 

The  subspace  S’  contains  the  set  of  feasible  R’s.  Also, 
from  duality  theory  [26],  problem  (5)  can  be  posed  in  the 
dual  space  of  lnx**n~,  that  is,  Z^xn»  as  the  following 
maximization  problem: 

pi 0  =  max  ( H,G )  (7) 

C6?1 

IIGIUsl 

where  <  H,G >  is  the  value  of  the  bounded  linear  func¬ 
tional  G  at  the  point  H: 

(h.g)  -’i'ii 

i- 1 /-I k-0 

and  is  the  right  annihilator  of  S’: 

SL  =  {G  £  l^Xnw\(R,G)  =  0  VZ?  eS}. 

Furthermore,  if  a  solution  to  (5)  exists,  say  <S°,  then  it  is 
aligned  with  every  solution  G°  to  (7),  that  is  <$°,  G°)  = 
ll^lhUGX.  This  implies  that  $>°  and  G°  must  satisfy  the 


pressions) 


U  =  LuMuRc 


V  —  LyMyRy  W 

where  Lv,  R0,  Ly,  and  Rv  are  (polynomial)  unimodular 
matrices.  Under  the  rank  assumptions  on  U  and  V  the 
rational  matrices  Mv  and  Mv  have  the  foUowmg  diago¬ 
nal  structure: 
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Let  A0  be  a  zero  of  U( A).  Let  <rt,(  A0)  denote  the 
multiplicity  of  A0  as  a  root  of  e,(A),  then  {o-v(X0 )}"?, 
defines  a  nondecreasing  sequence  of  nonnegative  integers. 
For  a  given  i  e  {1, 2, ■••,«„},  o-y(A0)  is  known  as  the 
algebraic  multiplicity  of  A0.  The  total  number  of  indexes  i 
for  which  oy,(A0)  is  strictly  positive  is  known  as  the 
geometric  multiplicity  of  A0.  Similarlv,  define  {a-,^(  A0)}"> , 
for  K(A).  '  '  _J‘ 

Let  Auy  denote  the  set  of  zeros  of  U  and  V  in  3.  In 
order  to  prove  the  interpolation  theorem  (i.e.,  apply  the 
results  of  Lemma  3.1)  we  need  the  following  assumption. 

Assumption  1:  Auv  <z3. 

Consider  the  unimodular  matrices  in  (8).  Since  their 
inverses  are  polynomials,  one  can  define  the  following 
polynomial  row  and  column  vectors: 

5,(A)  =  (4').(A)  i  =  1,2,-",  nz 

Pj(\)  =  (Ryl)\x)  ;  =  1,2,-,  nw.  (12) 

Now  we  are  ready  to  present  the  main  interpolation 
theorem.  These  conditions  are  different  from  those  in  [29] 
and  do  not  require  coprime  factorizations. 

Theorem  3.1:  Given  R  e  there  exists  Q  e  > 

such  that  R  =  UQV  if  and  only  if  for  all  A0£Awc^ 
the  following  conditions  are  satisfied: 
i)  {a^f^KX 0)  =  0  for 
i  =  L —,nu 

,  <  j  I,"1,  nv 

k  =  0,—,  cry(A0)  4-  cr^  (  A0)  -  1 
I (*,X)(X)  =  o  for  i  =  nu  + 

j(i?/3;)(A)  =  0  for;  =  ny  +  l,—,nw. 

Proof:  Consider  the  following  factorization  of  Mu 
and  Mv  (where  0  denotes  a  block  of  zeros  of  appropriate 
dimensions): 

|;  A 1v='{Vy'iy  0) 

where  <rL,  and  9V  retain  the  zeros  in  Auv  while  'Ly  and 
capture  the  stable  (i.e.,  minimum-phase)  zeros  of  U 
and  V  along  with  their  (stable)  poles.  Thus,  both  Vv  and 
Wy  are  invertible  in  /,.  Then, 

R  =  L^uQ&v  °Ji?K 

where  Q  ■■=  %lRuQLyVyl.  Clearly,  Q  e  l^Xn>  if  and 
only  if  Q  e  Next,  define  the  following  partitions  of 


Lv  and  Rv: 


where  Lu  x  has  nu  columns  and  RVA  has  nY  rows.  Then, 
given  R  e  /"-Xn-, 

3Qg  If*"'  such  that  R  =  UQV 

5 

3Q  <=  /"“xn>  such  that  R  =  Lv  xguQgvRv  v 

Necessity  of  condition  i )  follows  immediately.  Take  any 
i  e  (1,- -,nj  and  j  e  {l,---,  then 

(M/3,)(A)  =  n  (A- A0)^.(A»)|i7(A) 

x0s\ul/ 

■  n  (a-a0)^Uo) 

*oeAuv 

which  implies  condition  i)  by  Lemma  3.1  and  the  fact  that 
qu  is  in  /,. 

Necessity  of  condition  ii)  results  from  the  following: 
take  any  i  e  {nu  +  and  j  e  {n  +  1,— ,nw),  then 

(a^XA)  =  0  and  {Rf 3;XA)  =  0  since  (a,Ly  tX A)  =  0  and 
(Rv  lfJXX)  =  0. 

To  show  that  conditions  i)  and  ii)  are  sufficient  we 
proceed  by  backwards  construction:  by  Lemma  3.1, 


for  some  W  e  >  sjnce  p  G  /"••Xn».  Moreover, 


Therefore,  combining  these  equations  into  one, 

Lu'RRy1  =  |  $uW$v  °j 

which  implies  that  W  =  Q  is  the  solution.  ■ 

In  words,  Theorem  3.1  provides  a  set  of  algebraic 
conditions  which  are  necessary  and  sufficient  for  R  to  be 
feasible  (i.e.,  equivalent  to  UQV  for  some  stable  Q).  The 
conditions  in  i)  make  sure  that  the  left  and  right  unstable 
zero  structure  of  the  composition  UQV  is  preserved  while 
the  conditions  jn  ii)  impose  the  correct  (normal)  rank 
conditions  on  R.  Intact,  it  is  possible  to  view  the  collec¬ 
tion  of  a’s  and  fy's  for  i  >  nu  and  j  >  ny,  as  two 
polynomial  basis  (not  necessarily  of  minimal  degree)  for 
the  left  and  right  nullspaces  of  R(  A)  (see  [23 D.  By  virtue 
of  the  Smith-McMillan  decomposition  these  sets  of  poly¬ 
nomial  vectors  are  linearly  independent  (over  the  field  of 
rational  functions)  so  they  generate  a  minimal  set  of 
constraints  on  R  (Note:  the  four-block  case  has  some 
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redundancy  which  can  be  eliminated  a  priori ,  see  [18]  for 
a  detailed  discussion). 

In  the  sequel,  we  refer  to  the  conditions  in  i)  as  the  zero 
interpolation  conditions,  and  to  the  conditions  in  ii)  as  the 
rank  interpolation  conditions.  Rank  interpolation  condi¬ 
tions  are  also  known  by  the  names  of  relations  [11]  and 
convolution  conditions  [33],  [34], 

Problems  of  the  form  (4)  have  been  traditionally  classi¬ 
fied  in  the  and  ^  literature  according  to  the  dimen¬ 
sions  of  the  different  signal  spaces  involved.  Here  we 
adopt  the  same  classification: 

•  One-Block  Problems:  When  nw  =  ny  and  nz  =  nu. 
These  are  also  known  as  good  rank  or  square  prob¬ 
lems. 

•  Two-Block  Column  Problems:  When  nw  =  ny  and  n, 

>  "u- 

•  Two-Block  Row  Problems:  When  nw  >  ny  and  n.  - 
nu. 

•  Four-Block  Problems:  When  nw  >  ny  and  n,  >  nu. 


In  Theorem  3.1  we  have  shown  how  the  internal  stabil¬ 
ity  of  the  closed-loop  system  is  assured  if  the  zero  struc¬ 
ture  of  the  left  unstable  zeros  of  U  and  the  right  unstable 
zero  of  V  is  preserved  in  R.  Such  structure  is  character¬ 
ized  by  the  zero  frequency,  its  algebraic  and  geometric 
multiplicity,  and  its  directional  properties  as  given  by  the 
corresponding  polynomial  vector  at  or  fy.  Despite  its 
numerical  problems,  the  Smith-McMillan  decomposition 
provides  the  most  natural  way  of  characterizing  the  zero 
and  pole  structure  of  a  rational  matrix.  To  circumvent  the 
formal  Smith-McMillan  decomposition  of  Ui\)  and  F(A), 
it  is  necessary  to  find  an  alternative  set  of  conditions  that 
unequivocally  defines  the  zero  structure  of  a  rational 
matrix.  Such  a  set  is  presented  in  this  section. 

The  theory  of  zeros  of  MIMO  systems  has  been  studied 
extensively,  both  from  an  algebraic  and  state-space  per¬ 
spective  [28],  [16],  [31].  It  is  well  known  that  a  zero  of  a 
square  system  given  in  state-space  form  [A,  B,C,  D],  is 
characterized  by  the  solution  of  a  generalized  eigenvalue 
problem  of  the  form  [28]: 


A  problem  is  labeled  multiblock  when  it  is  not  one-block. 
Multiblock  problems  are  also  known  as  bad  rank  prob¬ 
lems  [11],  [29]. 

Clearly,  one-block  problems  only  require  zero  interpo¬ 
lation  conditions  and  have  no  rank  interpolation  condi¬ 
tions,  while  two-block  row  (column)  problems  require 
right' (left)  rank  interpolation  conditions,  and  four-block 
problems  require  both  left  and  right  rank  interpolation 
conditions. 

A.  Computation  of  Interpolation  Conditions 

The  problem  of  finding  the  Smith-McMillan  decom¬ 
position  of  rational  matrices  is  at  the  heart  of  the  inter¬ 
polation  problem.  This  decomposition  has  been  studied 
thoroughly  due  to  its  strong  connections  with  several 
important  notions  in  system  theory  (e.g.,  multivariable 
zeros  and  poles),  although  mostly  from  an  algebraic  point 
of  view  [23].  The  standard  algebraic  algorithm  to  compute 
such  objects  is  based  on  the  Euclidean  division  algorithm, 
known  to  be  numerically  sensitive.  Nevertheless,  there  has 
been  some  effort  in  this  direction,  for  example,  by  using 
symbolic  methods  from  computer  algebra  on  polynomial 
matrices  [4],  However,  it  is  generally  desirable  to  have 
algorithms  based  on  the  state-space  representation  of 
systems,  that  are  more  easily  implemented  on  digital 
computers. 

Here  we  present  an  alternative  approach  to  the  prob¬ 
lem  of  finding  the  zero  interpolation  conditions  of  a 
square  rational  matrix.  Such  approach  avoids  the  explicit 
computation  of  the  Smith-McMillan  decomposition.  Fur¬ 
thermore,  it  is  computationally  attractive  since  it  is  based 
on  finding  the  nullspaces  of  certain  Toeplitz-like  matrices 
which  are  formed  directly  from  the  state-space  represen¬ 
tation  of  the  system. 

Although  multiblock  problems  require  rank  interpola¬ 
tion  conditions,  we  will  show  that  those  problem  can  be 
posed  in  such  a  way  that  only  zero  interpolations  need  to 

be  considered. 


A-z0I  B|[*oL0 
C  D )[uo) 

where  z0  -  A^1,  x0  is  known  as  the  state  zero  direction 
and  Uq  is  known  as  the  zero  input  direction.  However,  the 
numerical  stability  of  such  eigenvalue  problem  deterio¬ 
rates  quickly  when  there  are  zeros  with  algebraic  multi¬ 
plicity  greater  than  one.  Indeed,  such  difficulty  is  equiva¬ 
lent  to  finding  the  Jordan  decomposition  of  a  defective 
matrix  (i.e.,  a  nondiagonalizable  matrix)  which  is  known  to 
be  a  hard  numerical  problem  [22]. 

Although  it  is  diffcult  to  obtain  the  full  zero  structure 
directly  from  the  state-space  description  of  a  system,  the 
location  or  frequency  of  the  zeros  can  be  reliably  com¬ 
puted  [20].  In  the  sequel,  we  will  assume  that  the  locations 
of  the  unstable  zeros  of  the  rational  (square)  matrices 

Ui  A)  and  F(A)  are  available. 

Following,  we  introduce  a  useful  definition  along  with 

some  notation.  , 

Definition  3.1:  Given  a  rational  matrix  Hi  A)  analytic  at 
A0  and  a  positive  integer  cr;  define  the  following  block- 
lower- triangular  Toeplitz  matrix: 


H0 

0 

0 

...  0  ' 

Hx 

H0 

0 

...  o 

A-i 

H,- 2 

Hr- 3 

H0i 

where  the  H’s  are  given  by  the  Taylor  expansion  of  Hi  A) 
at  A0,  that  is, 

Hi\)  =  H0  +  (A  -  A 0)H{  +  (A  -  A0)2H2 

+  (A  —  Aq)  + 

and  Ht  =  (l/i!XH)(i)(A0). 

A  numerically  stable  method  was  proposed  in  136]  to 
find  the  structural  indices  associated  with  poles  and  zeros 
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of  a  stable  rational  matrix  H,  by  looking  at  the  rank  of 
TKfj_a(H)  as  <r  increases.  Such  approach,  however,  does 
not  provide  the  directional  information  necessary  to  con¬ 
struct  the  interpolation  conditions.  Here  we  present  an 
extension  of  the  ideas  in  [36]  by  looking  at  the  structure  of 
the  nullspace  of  TK<j  a(H)  for  increasing  values  of  a.  Such 
approach  has  strong  connections  with  the  general  interpo¬ 
lation  theory  of  rational  matrix  functions  [1],  [2].  In  partic¬ 
ular,  it  exploits  the  analyticity  of  the  matrices  U  and  V  in 
the  disk. 

The  following  definition  establishes  some  terminology 

[1]. 

Definition  3.2:  Given  an  m  X  n  (real)  rational  matrix 
Hi  A)  analytic  at  A0,  a  right  null  chain  of  order  a  at  A0  is 
an  ordered  set  of  column  vectors  in  1R\  {x1;  x:,---,  -0> 
such  that  Xj  ¥*  0  and 


\X<r! 


=  0. 


Similarly,  a  left  null  chain  of  order  <r  at  A0  is  an  ordered 
set  of  column  vectors  in  IRm,  {yx,  such  that 

yq  =£  0  and 


T^(HT) 


V 
y  2 


V*! 


=  0. 


The  next  Theorem  shows  that,  if  H  is  square,  the  exis¬ 
tence  of  a  right  (left)  null  chain  of  order  a  at  A0  is 
equivalent  to  the  existence  of  a  zero  at  A0  of  algebraic 
multiplicity  cr.  It  is  an  extension  of  [21,  theorem  1.12]. 
Later,  we  will  establish  a  complete  equivalence  between 
the  structure  of  a  zero  and  the  null  chains  associated  with 
that  zero. 

Theorem  3.2:  A  full  rank,  n  X  n,  rational  matrix  Hi  A), 
analytic  at  A0,  has  a  zero  at  A0  of  geometric  multiplicity'  / 
and  a  sequence  of  structural  indexes  equal  to,  at  least, 
o-„_/+1,---,  cr„  (o-j  =  •••  =  <r„_,  =  0)  if  and  only  if  the 
following  conditions  hold 

1)  There  exist  /  polynomial  vectors,  «],••*,«,,  such 

that 


/  »  \(fc) 

[HUj]  (A0)  =  0  for  k  =  (),•••,  crn_,+j  -  1 

V;  =  1,-,/. 

2)  The  set  of  vectors  {uj(A0),”-,  u^Aq)}  is  linearly 
independent  and 

span{M,(A0),—,M,(A0)}  =^//(A0)] . 

where  yT[]  denotes  the  null  space  of  a  matrix. 

Proof:  Necessity  follows  directly  from  the  Smith-Mc- 
Millan  decomposition  of  //(A): 

Hi\)  =  L(A)M(A)f?(A). 


Say  that  the  ;th  entry  (j  >  n  -  l  +  1)  on  the  diagonal  of 
M  has  a  factor  (A  -  Ag)^.  Then,  pick  to  be  the  ;'th 

column  of  With  this  choice 


Huj-^t  =  HiR~')J 

=  (A-  kX'Lpj-n-t  V;  =  n-  /  +  1  ,-,n 

where  p;_n  +  /(A)  is  a  rational  vector  analytic  at  A0.  Clearly, 
this  implies  that  (//«._ „+,)(<c>(A0)  =  Ofork  =  0,---,  o-;  -  1, 
and  further  the  set  (m^Aq),"-, utik0)}  is  linearly  indepen¬ 
dent  since  R  is  unimodular  and  spans  the  null  space  of 
Hi\0). 

The  proof  of  sufficiency  is  not  as  straightforward.  Let 
z  ~  Hdj  j  -  l,---,  /  and  define  the  following  auxiliary 
rational  vectors: 


y,(A)  -  (L-'f;)(A),  Cji\)  ■■=  (Ruj)i\)  j  =  1 

Then,  we  have  that  y;(A)  =  Mi\)CjiX).  Note  that 
uf. A0)---  «,(A0)  are  linearly  independent  if  and  only  if 
£,(A0)  •••  v,i A0)  are  linearly  independent  since  R  is  uni¬ 
modular.  Further,  since  multiplication  by  a  unimodular 
matrix  preserves  the  zero  structure,  this  direction  of  the 
proof  can  be  restated  as  follows:  let  j  =  1, . . . ,  then 

3fi;(A)  such  that  tyUo)  •••  v,i\0)  are  linearly 

independent  and  yjk)i  A0)  =  0,  k  =  an_lJrj.x 

11 

3(A  -  A0)<t"~'*'  in  the  n  -  l  +  j  diagonal  entry  of  M(A). 


Now,  it  follows  from  above  that 


9ji A)  =  (A  -  A0 )'"-'*'£( A). 

Let  cf  A),  j  =  I/-,  n  be  the  diagonal  entries  of  the  matrix 
M.  It  immediately  follows  that: 


(CjfA)-  v,i A)) 
|  (A  — 


=  ip i(A)  •••  p,i A)) 


(A- A „)'«] 
(15) 


First,  we  show  that  the  matrix  (i  j( A0)  —  t’;(A0))  has  the 
structure 


I  ° 

( A0) 


(16) 


The  top  zero  block  results  from  the  fact  that  the  matrix 
Af(A0)  has  a  null  space  of  dimension  /  (otherwise  there 
will  be  more  linearly  independent  vectors  than  /),  hence, 
!„•••,  in_,  do  not  have  zeros  at  A0.  From  (15),  it  follows 
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that  for  all  A 


€n-l^\ 


Proof:  Both  directions  of  the  proof  follow  immedi¬ 
ately  by  equating 

iH  A)  =  xl  +  (A  -  A0).r;  +  •••  +  ( A  -  A0)  xa. 


'(A-Ao)^1 


(A  -  An)‘ 


where  the  matrices  V  and  P  are  obtained  from  the 
decompositions 


(y,(A)  •••  t5,(A))  = 


(p1(A)-p;(A))  = 


1  (A  —  A0)C 


(A  -  An)^ 


Then,  from  (16),  it  is  clear  that  F(A0)  has  full  rank.  Let 
Rv  R-,  be  unimodular  matrices  such  that 

VRX=  L  where  L  is  lower  triangular 


R.p  —  U  where  U  is  upper  triangular. 


Note  that  if  H  has  a  right  zero  of  geometric  multiplicity 
greater  than  one,  say  /,  then  there  are  /  different  right 
null  chains  (not  necessarily  of  the  same  order)vsuch  that 
the  span  of  the  xfs  equals  the  nullspace  of  Hi  A0).  Let 
xi(y‘)  denote  the  z'th  right  (left)  null  chain  of  order 
then  the  following  definition  applies  [1]. 

Definition  3.3:  A  canonical  set  of  right  null  chains  of 

A)  at  A0  is  an  ordered  set  of  right  null  chains,  i.e., 
x‘  =  (xj  xp  for  i  =  1,-,/,  such  that 

i)  U{,  x\',— ,  x[)  are  linearly  ^independent, 

ii)  span  {*},  jcf ,*■*,  x[)  =sk[Hi A0)],  and 

iii)  ax  >  cr2  S:  •”  >  o-,. 

A  canonical  set  of  left  null  chains  is  defined  similarly. 

Next,  we  show  that  the  zero  interpolation  conditions  of 
Theorem  3.1  can  be  stated  in  terms  of  the  canonical  set  of 
right  null  chains  of  V  and  the  canonical  set  of  left  null 
chains  of  U  at  each  A0  e  Auv.  For  that  we  need  to 
introduce  an  extension  of  the  above  definition. 

Definition  3.4:  An  extended  set  of  right  null  chains  of  a 
full  rank  n  x  n  rational  matrix  Hi  A)  at  A0,  is  a  canonical 
set  of  right  null  chains  augmented  with  n  -  /  vectors  in 
R",  i.e.,  [x[+  ',•••, xx},  such  that  the  span  of  {x\,xlf— ,xx) 
is  equal  to  R".  The  order  associated  with  these  added 
chains  is  zero. 

From  the  above  definition,  if  a  square  rational  matrix 
has  no  zeros  at  A0,  then  the  corresponding  canonical  set 
of  null  chains  is  empty  and  the  extended  set  is  a  basis  for 
R",  e.g.,  the  columns  of  an  n  X  n  identity  matrix. 

Next,  we  apply  the  above  results  and  definitions  to  the 
zero  interpolation  conditions  of  a  one-block  problem.  In 
the  context  of  Theorem  3.1  we  have  the  following  equiva¬ 
lence:  for  j  =  and  k  =  0,-",  crv.  -  1, 


(Vfijf'i  A0)  =  0 


T.  ,  iV)xn>-i+1  =0 

Aq  i  & Vj 


From  this  (17)  can  be  factored  as  follows: 

EL=RpUDRx. 

dearly,  the  matrix  EL  has  the  same  zero  structure  as  the 
matrix  UD.  By  direct  computation  of  the  Smith  matrix  of 
UD,  it  follows  that  (A  -  A or-‘*>  is  a  factor  of  jth  diago¬ 
nal  element.  Since  L  has  full  rank  at  A0,  it  follows  that 
(A  -  A0 is  a  factor  of  This  completes  the 

proof. 

Note  that  a  similar  result  holds  for  left  zeros  simply  by 
replacing  H  and  HT.  The  following  corollary  restates  the 
result  of  Theorem  3.2  in  terms  of  null  chains. 

Corollary  3.1:  A  full  rank,  square,  rational  matrix  Hi  A) 
analytic  at  A0,  has  a  right  (left)  zero  at  A0  of  (at  least) 
algebraic  multiplicity  <r  if  and  only  if  there  exits  a  right 
(left)  null  chain  of  order  cr  at  A0. 


where  x‘  is  an  extended  set  of  right  null  chains  for  V  at 
A0.  The  sequence  of  x‘’s  has  to  be  reversed  in  the  above 
equation  due  to  the  fact  that  arv.  is  a  nondecreasing 
sequence  of  algebraic  multiplicities  while  an  extended  set 
of  null  chains  is  defined  with  the  opposite  ordering.  Note 
that  if  oy  =  0  then  both  conditions  are  satisfied  trivially 
(i.e.,  there  are  no  conditions).  Similarly,  for  i  —  1 
and  k  =  0,—,  cr^  -  1. 


(a, {/)“’(  a,)-o 


In  other  words,  the  extended  set  of  left  and  right  null 
chains  are  locally  (i.e.,  for  each  A0)  equivalent  to  the 
polynomial  vectors  afs  and  fifs.  Having  made  this  obser¬ 
vation,  we  are  ready  to  present  an  alternative  set  of  zero 
interpolation  conditions. 

Given  an  element  of  an  extended  set  of  right  null 
chains  at  A0,  x>,  of  order  oj,  define  the  Mowing  poiyno- 
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mial  vector: 

x{(, A)  ;=  x{  +  (A  —  A Q)xl  +  •••+( A  —  A0)  ' 

if  a  >  0.  and  x{(X)  ■■=  x{  if  o- =  0.  Similarly,  define 
v'  ( A)  for  an  element  of  an  extended  set  of  left  null  chains. 
yf  of  order  a,.  With  this  notation  we  have  the  following 
corollary. 

Corollary'  3.2:  Given  a  one-block  problem,  the  zero  in¬ 
terpolation  conditions  of  Theorem  3.1  are  equivalent  to 
the  following:  for  all  A0  e  \uv, 

- ° 

I  =  1,—,NU 

for  i  J  ~  ny 

k  =  0,-",  o-y  (Aq)  +  o’,-  (A0)  -  1 

where  y‘  and  x‘  are  elements  of  the  extended  sets  of  left 
and  right  null  chains  of  U  and  V  respectively,  and  ov  and 
crv  are  the  corresponding  orders  (i.e„  algebraic  multiplic¬ 
ities). 

Proof:  Follows  directly  from  Theorems  3.1  and  3.2, 
and  from  the  above  definitions.  ■ 

B.  Computation  of  Null  Chains 

This  subsection  discusses  a  simple  algorithm  to  com¬ 
pute  the  extended  set  of  null  chains  at  A0_of  a  full  rank 
square  rational  matrix  analytic  at  A0.  Let  Hi  A)  denote  an 
n  X  n  rational  matrix  and  assume  that  A0  is  given,  then 
the  algorithm  is  based  on  the  computation  of  a  basis  for 
the  nulispace  of  a{H)  for  increasing  values  of  a. 

Consider  the  construction  of  an  extended  set  of  right 
null  chains.  By  Definition  3.2,  given  some  positive  integer 
<t,  any  vector  in  the  kernel  of  TXo  aiH)  such  that  x,  #  0 
is  a  potential  member  of  the  set.  Let  Ba  denote  a  matrix 
whose  columns  form  a  basis  for  the  right  nulispace  of 
Tx  „*(//),  then  the  following  algorithm  generates  an  ex¬ 
tended  set  of  right  null  chains. 

Step  1 )  Compute  Ba  for  a  =  1,2,  •••  until  the  top  n 
rows  are  filled  with  zeros  (no  more  null  chains  can  be 
extracted  at  this  point).  Then  the  maximum  order  of  any 
chain,  <rx,  is  given  by  the  current  value  of  the  counter  (cr) 
minus  one.  Note  that,  by  Corollary  3.1,  this  iteration 
process  is  guaranteed  to  stop  since  the  rational  matrix  H 
is  finite  dimensional  (i.e.,  its  zeros  have  finite  algebraic 
multiplicity). 

Step  2)  Let  6,  for  i  =  l,--,r  denote  each  column  of 
Ba .  Reduce  the  dimension  of  the  b’s  by  removing  all  sets 
of  n  contiguous  zeros  at  the  top  of  each  vector.  The  result 
is  a  collection  of  r  vectors  (possibly  of  different  dimen¬ 
sions)  such  that  the  top  n  entries  of  each  one  define  a 
nonzero  vector  in  R".  (Note  that  at  least  one  will  have 
dimension  nav) 

Step  3)  Sort  the  resulting  vectors  in  decreasing  order 
of  dimension.  Let  /  be  the  rank  of  the  n  X  r  matrix  that 
results  from  collecting  the  first  n  rows  of  each  vector. 
Then,  select  the  first  /  vectors  such  that  the  reduced 


matrix  that  results  from  collecting  the  first  n  rows  of  each 
vector  has  rank  Z.  Such  collection  forms  a  canonical  set  of 
right  null  chains. 

Step  4)  Extend  the  set  by  augmenting  the  collection 
with  n  —  l  vectors  such  that  the  set  of  n  vectors  formed 
with  the  first  n  rows  define  a  basis  in  Rn. 

If  the  system  //(A)  is  given  in  state-space  form,  say 
[A,B.C,D],  then  the  Toeplitz  matrices  TXg  a(H)  can  be 
easily  computed  using  the  following  equation  (see  Defini¬ 
tion  3.1): 

A0C(/  -  XqA)  ]B  +  D  for  Z:  =  0 

C(7  -  X0A)-k~l Ak-'B  fork  =  1,2,  —  . 

Note  that  (/  —  A0^4)-1  always  exists  since  A0  is  in  the 
unit  disk  and  H  is  stable  (i.e.,  analytic  in  the  closed  unit 
disk).  A  word  of  warning  is  necessary,  however,  when  A0 
is  close  to  the  unit  circle  and  A  has  a  stable  eigenvalue 
that  is  also  close  to  the  unit  circle  and  next  to  A0.  Such 
cases  may  give  rise  to  numerical  difficulties.  Besides  this 
fact,  the  rest  of  the  algorithm  only  involves  the  computa¬ 
tion  of  nuilspaces  that  can  be  done  efficiently  through  the 
well  known  QR  or  singular  value  decompositions  [22], 

C.  A  Simple  Example 

In  order  to  illustrate  the  workings  of  the  algorithm 
introduced  in  the  previous  section,  a  simple  example  is 
presented.  Let  H(A)  be  a  3  X  3  polynomial  matrix  given 
by: 

(A-0.5)2  A(A  +  2)(A  -  0.5)  0 

77(A)  =  (a-0.5)3  A(A  -  0.5)  0  • 

0  0  A2, 

We  have  chosen  a  polynomial  matrix  just  to  make  the 
example  tractable  without  the  aid  of  a  computer.  Let  us 
construct  an  extended  set  of  right  null  chains  for  the  zero 
at  A0  =  0.5.  According  to  step  one,  we  compute  the 
nulispace  of  TXg  „{H )  for  a  =  1. 2,  .  In  particular,  for 

a  =  3  we  have: 

0  0  000  0000' 

00  000  0000 

00  .25  00  0000 

0.5  000  0000 

T05  3(7/)=  0.5  000  0000; 

00  1  00  . 25  0  0  0 

1  1.5  0  0  .5  0  0  0  0 

01  00. 5  0000 

0  0  1  0  0  1  0  0  .25, 

0  0  O' 

0  0  0 

0  0  0 

0  0  1 

B3=  0  0  0  . 

0  0  0 

10  0 
0  1  0 

,0  0  0, 
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Clearly,  the  first  three  rows  of  £:.  are  zero  so  we  stop 
increasing  cr.  Then,  the  maximum  algebraic  multiplicity  of 
A0  =  0.5  is  two,  i.e.,  <ry  =  2.  Next,  (Step  2),  reduce  each 
column  of  B3  by  eliminating  the  leading  blocks  of  zeros  to 
get: 


l\  [0\ 

0  ;  b2=  l  ;  f>3  = 
0  \0 


Then  (Step  3),  reorder  the  set  of  vectors  in  decreasing 
dimension,  i.e.,  {b3,bvb2},  and  compute  the  rank  of  the 
matrix  formed  with  the  first  three  rows: 


'll  1  0^ 
/  =  rank  0  0  1 

10  0  0 


Then,  the  canonical  set  of  right  null  chains  is  given  by 
{x1,*2}  where 


and  x-  = 


[«W'>L  ”  £  ~  0 


^)2r[(Af)(fc>] 


with  their  corresponding  order  (i.e.,  algebraic  multiplicity) 
being  <rx  =  2  and  <r2  =  1-  This  indicates  that  the  geomet¬ 
ric  multiplicity  of  A0  is  two.  Finally  (Step  4),  to  get  an 
extended  set  of  right  null  chains  we  augment  the  collec¬ 
tion  with  x3  =  (0  0  l)r  having  order  cr3  =  0  (by  defini¬ 
tion). 

IV.  Duality  and  Existence 

With  Theorem  3.1  we  have  established  a  compact  alge¬ 
braic  characterization  of  the  set  ^  Next,  we  need  to 
interpret  these  results  in  the  context  of  (7),  which  calls  for 
the  identification  of  the  subspace  of  IZ!*"W  which  annihi- 
lcitcs  * 

Following  the  approach  in  [11]  and  [29],  we  write  the 
zero  interpolation  conditions  as  functionals  acting  on  R. 
Indeed,  for  all  (i,j,k)  in  the  ranges  established  in  Theo¬ 
rem  3.1,  for  /  =  0,1,—,  and  all  A0  e  Auv,  define  RFijk Ao 
and  IFijkXt  in  such  that 

[«W'>]„  -  E  £  “*(j  ■  0 


where  3UA)  and  2(A)  denote  the  real  and  imaginary  part 
of  A  respectively,  and  aiq  denotes  the  f?th  column  of  a, 
while  /3  denotes  the  pth  row  of  /3;,  By  straightforward 

algebra  "’it  can  be  shown  that  (R,RFijkx 0>  =  0  and 
(R  IF  k  )  =  0  if  and  only  if  R  satisfies  the  zero  interpo¬ 
lation  conditions  of  Theorem  3.1.  Note  that  only  a  finite 
number  of  sequences  are  required,  thus  the  subspace 
spanned  by  the  sequences  associated  with  the  zero  inter¬ 
polations  "is  finite  dimensional.  In  fact,  the  number  of 
functionals  is  given  by: 

cz==  E  E  E^XV  +  ovXAo)-  (20) 

^0e  ^uv  1—1  / =  1 

A  note  should  be  made  on  the  way  c2  is  computed.  If  a 
given  A0  e  Auv  is  complex  then  A0  e  Auv  too,  since  U 
and  V  are  real-rational.  However,  for  the  purpose  of 
constructing  functionals,  only  one  of  each  pair  of  com¬ 
plex-conjugate  zeros  should  be  considered  since  the  other 
one  would  generate  redundant  functionals.  But,  for  the 
purpose  of  counting  the  number  of  independent  function¬ 
als  (i.e.,  computing  c.),  both  zeros  should  be  included  in 
AyK,  since  a  complex-conjugate  pair  of  zeros  generate 
twice  as  many  functionals  as  a  real  zero.  .  _ 

Next,  we  look  at  the  rank  interpolation  conditions  [i.e., 
conditions  in  ii)l  Again,  these  algebraic  conditions  can  be 
viewed  as  convolution  of  sequences.  For  i  nu  +  ,  , 
and  q  =  1,— ,nw,  define  the  following  sequence  of  nz  X 

nw  matrices: 

gth  column 


-  0 


0  af(t  -  0  0  - 


where  t,  l  e  Z+.  Similarly,  for  j  =  ny  +  l,-,nw  and  p 
l,—,n2,  define 


X^U)  - 


0 

#('  -  0 
0 


}pth  row.  (22) 


•prt(«->)«[(*')<"] 


Then,  < R, X„„ >  -  0  and  <«,*,,„>  -  0  for  I  -  0  1,  - 
if  and  only  if  R  satisfies  the  rank  interpolation  conditions 
of  Theorem  3.1.  Note  that,  in  contrast  with  the  zero 
interpolation  sequences,  the  linear  span  of  the  X  qt  s  and 
Xp.pt’s  is  infinite  dimensional  since  for  every  (i,q,p), 
can  take  infinite  values  (i.e.,  t  e  Z+). 
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The  next  theorem  gives  a  sufficient  condition  for  the 
existence  of  an  optimal  solution  to  (5).  The  proof  is 
omitted  since  the  arguments  involved  are  essentially  the 
same  as  those  in  [11],  [29], 

Theorem  4.1:  If  every  A0  G  \uv  is  strictly  inside  the 
unit  disk,  then  there  exists  R°  g  5?  such  that 

fi°  =  \\H  -  R%  =  inf  || H-R\U. 

Note,  however,  that  the  above  result  is  more  general 
than  that  in  [29],  where  it  is  assumed  that  U  and  V  have 
square  partitions  with  no  zeros  on  the  unit  circle.  Such 
extra  assumption  was  avoided  by  determining  the  full  set 
of  interpolation  conditions  directly  from  the  Smith-Mc- 
Millan  decomposition  of  U  and  V. 


subject  to 

nw  x 

E  E  (</>, +j(t)  +  <  p  for  i  =  l,-,nz  (24) 

;'=  1  r  =  0 

-  H  ey. 

Next,  we  shift  attention  to  the  linear  constraints  repre¬ 
senting  the  feasible  set.  From  the  previous  discussion  it  is 
clear  that  a  given  is  feasible  (i.e.,  there  exists  a  stable  Q 
such  that  =  H  -  UQV)  if  and  only  if 

j  ($>,RFljkXo)  =  (H,RFljkXo) 

\<*,IFijkXa)  -  {H,IFijkXt) 


V.  Optimization  and  Linear  Programming 

This  section  will  establish  the  equivalence  between  the 
primal-dual  pair  of  optimization  problems  (5)— (7)  and  a 
primal-dual  pair  of  infinite  dimensional  linear  programs. 

By  definition,  c  /"-Xn-  is  the  linear  span  of  the 
sequences  (18)— (22),  and  G  is  any  element  in  that  sub¬ 
space  with  infinity  norm  not  greater  than  one.  That  is, 

G  G  span  {RFijkXo,  IFijkko,  Xaq„X8p)  (23) 

with  the  appropriate  index  ranges. 

In  order  to  bring  (5)  and  (7)  into  a  standard  linear 
programming  form,  it  is  convenient  to  redefine  the  nota¬ 
tion,  the  purpose  being  to  express  both  the  objective  and 
the  feasible  subspace  in  (infinite)  matrix  form.  This  is 
possible  since  the  constraints  that  specify  the  feasible 
subspace  5*  are  no  more  and  no  less  than  an  infinite 
collection  of  linear  functionals  annihilating  the  sequence 
R,  which  can  be  expressed  as  an  infinite  collection  of 
equality  constraints  on  the  elements  of  the  sequence  <h. 

To  bring  the  primal  objective  function  ||<fi|li  into  linear 
form  and  avoid  the  nonlinearity  built  into  the  one  norm, 
we  use  a  standard  change  of  variables  from  linear  pro¬ 
gramming:  let  =  <I>+-  <J>~,  where  <t>+  and  0"  are 
sequences  of  nz  x  nw  matrices  with  nonnegative  entries. 
That  is,  with  a  slight  abuse  of  notation,  +  >  0  and 
<F~>0.  Then,  the  /,  norm  of  takes  the  form 
max,  +  <f>ij(t))  which  is  linear  in 

(<j)+,  $-).  This  expression  holds  only  if,  for  any  ( i,j,t ), 
either  <£,}(/)  or  <fyj(f)  is  zero,  which  is  a  guaranteed 
property  of  the  optimal  solution.  Indeed,  if  a  feasible 
solution  is  such  that  <fy)(f)  and  6~(t)  are  strictly  positive, 
then  reducing  both  variables  by  min (<£,:*(  f ),  <fyj(r))  re¬ 
duces  the  value  of  the  cost  and  does  not  violate  feasibility 
since  the  difference  remains  the  same,  and  further,  one  of 
the  two  variables  becomes  zero.  Therefore,  the  optimal 
solution  will  always  be  such  that  either  or  <fy"(t))  is 

zero.  Note  that  this  transformation  doubles  the  number  of 
variables  representing  the  closed-loop  response. 

Consequently,  the  primal  problem  (5)  can  be  restated 
as  follows: 


I  (<P,  Xa  qt)  =  (H,  Xa  qi) 
|  (<t>,X8iPl)  =  (H,X0iPI) 


for 


i  =  n„  +  I,---, 
j  =  ny  +  I,--*, 
q  —  1,“',  nw 
p  =  1  ,—,nz 
t  =  0,1,2,- 


(26) 


Each  of  these  equations  can  be  viewed  as  a  linear  equality 
constraint  on  the  sequence  3>. 

At  this  point  it  is  convenient  to  drop  the  tensor  nota¬ 
tion  used  so  far  and  introduce  a  more  compact, 
computer-ready  matrix  notation.  Let  Af,;  denote  an  infi¬ 
nite  matrix  mapping  /,  to  RC;,  formed  by  collecting  those 
coefficients  of  the  zero  interpolation  fymctionals  that  act 
on  the  sequence  cf>ir  Similarly,  define  to  be  an  infinite 
matrix  mapping  /,  to  /,,  formed  by  collecting  those  coef¬ 
ficients  of  the  rank  interpolation  functionals  that  act  on 
< .  With  this  notation,  the  set  of  feasible  closed-loop 
maps  is  characterized  by  the  following  set  of  equality 
constraints: 

/I  j  fX  ^ 

E  E  MiAj  =  E  E  Mijh,j  ==  bx  G  (27) 

i-1 j- 1  (“1 j-\ 

n  £  ft  w 

E  E%  =  E  E  - b2  g /j.  (28) 

1  =  1  ;'=  1  i  =  l  j-  1 

Therefore,  the  primal  optimization  problem  (5)  is 
equivalent  to  the  following  infinite  dimensional  linear 
program: 


p°  =  inf  /i 


li°  ~  min  p 

M,  f .  <t>ij  ,  <t>ij 
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subject  to 

nw  x 

£(i)  -  £  I  =  i a  for  i  =  1,— 

j- 1 (=0 

E  E  Mijitfi  -  <t>-j)  =  bx 
E  E  =  b2 

1=1 j  — 1 

(29) 

where  £  e  R"«  is  a  positive  vector  of  slack  variables.  Note 
that  the  above  linear  program  is  infinite  dimensional  in 
the  number  of  variables  (i.e.,  dimension  of  any  4>i;)  and 
the  number  of  constraints  (i.e.,  dimension  of  b2). 

In  order  to  complete  this  discussion,  it  remains  to  show 
that  problem  (7)  is  also  equivalent  to  a  linear  program¬ 
ming  problem.  In  fact,  it  can  be  shown  that  such  problem 
corresponds  to  the  standard  dual  formulation  of  problem 
(29).  To  illustrate  this  fact,  we  will  simply  write  the  dual 
form  of  (29)  and  compate  it  to  (7).  Let  y  e  denote  the 
sequence  of  dual  variables.  To  get  more  insight  into  the 
dual  problem,  let  us  partition  y  according  to  the  natural 
partitioning  of  the  set  of  equality  constraints.  That  is,  let 
y  =:  (-y0  ?!  y2)r,  where  y0  e  Rn-\  y{  e  RCj  and  y2  e 
l ^  (it  is  convenient  to  have  the  sign  of  y0  changed).  Then, 
the  standard  dual  linear  program  of  (29)  is  given  by: 


fies  the  problem  significantly  by  bringing  the  number  of 
equality  constraints  down  to  a  finite  value,  namely  c.  +  nz. 
There  remains,  however,  an  infinite  number  of  variables 
represented  by  the  s  in  lv  Nevertheless,  it  has  been 
shown  by  looking  at  the  structure  of  the  dual  problem, 
that  the"  underlying  problem  is  finite  dimensional  [9]. 
Indeed,  the  dual  formulation  has  an  infinite  number  of 
inequality  constraints  but  retains  a  finite  number  of  vari¬ 
ables: 


subject  to 


subject  to 


fi°  =  max  (bl,yl)  +  (b2,y2) 
ya,yi’yz 


y0  2:  0,  E  To(0  ^  1 

i-1 


-y 0(i)  <  (M.fy  +  Mljy2)(k)  <  y 0(«) 


/j.0  =  ma x<fc1,y1> 

■ro.t'i 


>  0;  E  ?o(i)  ^  1 


i  =  1  ,—,nz 

for  ■  j  =  1  ,—,nw  ■  (30) 
k  =  0, 1,  ••• 

If  one  compares  the  above  linear  program  with  problem 
(7),  the  following  relationships  become  apparent:  1)  yx 
and  y2  are  nothing  but  the  coefficients  that  combine  the 
linear  functionals  associated  with  the  zero  interpolation 
conditions  and  the  rank  interpolation  conditions,  respec¬ 
tively,  to  obtain  G;  2)  the  objective  function  results  from 
expanding  <  H,G >  when  G  is  expressed  as  a  linear  combi¬ 
nation  of  the  elements  in  the  generator  of  ^  with 
coefficients  (yt,y2);  and  3)  the  set  of  inequality  con- 
straints  is  equivalent  to  ||G|U  <  1,  while  the  second  line  of 
inequalities  bounds  G  componentwise,  the  first  line 
bounds  the  matrix  “-norm  of  G  by  one. 

VL  One-Block  Problems 

One-block  problems  have  a  very  specific  interpolation 
structure,  namely  no  rank  interpolation  conditions.  From 
a  primal  formulation  point  of  view  [see  (29)],  this  simpli- 


i  =  1  ,—,nz 

—  y0(i)  <  (Mty^ik)  <  y0(O  for  Ij  =  1 

\k  -  0, 1,  ••• 

,(31) 

Recall  that  Mj.  is  the  matrix  representation  of  an  opera¬ 
tor  mapping  to  However,  with  Assumption  1  hold¬ 
ing,  the  actual  range  of  Mjf  is  in  c0  since  each  of  the 
columns  of  Mj,  is  in  c0  and  there  are  only  finitely  many 
of  them.  This  is  exploited  in  the  following  lemma  from 

[34]. 

Lemma  6.1:  Let  M  be  a  full  column  rank  infinite 
matrix  mapping  R"  to  c0.  Then  there  exists  a  positive 
integer  N  such  that 

||(/ -  Pn)Mx ||x  <  \\PnMx\U 

for  all  nonzero  x  e  R". 

Note,  in  particular,  that  N  is  independent  of  x  and  is  only 

a  function  of  M.  ,. 

In  other  words,  given  a  matrix  mapping  a  finite  dimen¬ 
sional  space  to  c0,  it  is  always  possible  to  bound  the  index 
at  which  the  infinity  norm  of  any  sequence  in  the  range  is 
achieved. 

The  following  theorem  extends  a  result  from  [9]  by 
exploiting  this  structure. 

Theorem  6.1:  The  exact  solution  of  a  one-block  /j-opti- 
mal  control  problem  is  given  by  the  following  finite  di¬ 
mensional  (dual)  linear  program. 


subject  to 


fi°  =  max<h1,y1) 
■yo.ri 


y0  ^  0,  E  ?<>(*)  ^  1 

i=i 


-y 0(i)  <  (M^y1)(A:)  <  70(O 


i  =  1  ,-",nz 
for  j  = 

k  =  0  ,m",  Nij  <  00 
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Proof:  Form  matrices  M*  as  defined  before.  Assume 
they  have  full  column  rank  (if  not  reduce  the  number  of 
columns).  Apply  Lemma  6.1  to  each  Mj-  and  let  Ntj 
denote  the  corresponding  index  bound.  Then,  we  claim 
that  for  every  feasible  solution  of  problem  (31)  all  in¬ 
equalities  of  the  form  KAf,y'y1XA:)|  <  y 0(i)  for  k  >  Nt]  are 
inactive  constraints  (i.e.,  the  inequality  is  strict)  and  they 
can  be  ignored  in  the  solution.  Indeed,  by  Lemma  6.1,  if 
there  is  an  active  constraint  for  k  >  NtJ,  then  there  must 
have  been  a  violation  of  a  constraint  for  some  k  <  Ntj 
since  the  lx  norm  of  the  sequence  A/^y,  is  attained 
before  Ntj  and  is  always  bounded  by  y0(/).  ■ 

This  fact  has  an  immediate  and  important  implication 
on  the  primal  linear  programming  formulation  of  one- 
block  problems.  Due  to  the  alignment  conditions,  if  a  dual 
optimal  solution  is  such  that  all  inequality  constraints  are 
inactive  for  k  >  N,  then  the  primal  optimal  solution  is 
such  that  it  vanishes  for  k  >  N. 

Corollary  6.1:  For  any  one-block  problem,  the  /[-opti¬ 
mal  closed-loop  response,  <t>°,  has  finite  support  (i.e., 
finite  pulse  response).  Furthermore,  each  entry  <f>,;  has 
support  no  greater  than  Ntj. 

Note  that  the  N^'s  provide  apriori  bounds  on  the  lengths 
of  the  optimal  </>, ’s.  Moreover,  these  bounds  are  indepen¬ 
dent  of  H  and  only  depend  on  the  zero  interpolation 
structure  of  the  problem. 

We  conclude  this  section  with  an  interesting  property  of 
most  one-block  problems,  regarding  the  /,-norm  of  each 
row  of  the  optimal  solution. 

Corollary  6.2:  Given  a  one-block  problem,  if  for  some 
/  e  {1, •••,«.}  and  j  e  {l,”-,nj  the  matrix  M*  has  full 
column  rank,  then  IK<t,0)1ll i  =  p°- 

Proof:  Assume  IK^0), II i  <  p°,  then  £(/)  >  0.  By  the 
alignment  conditions,  this  implies  that  y„(/)  =  0,  and  in 
view  of  (32)  and  the  rank  condition  on  M?,  y,  must  be 
zero.  But  this  implies  that  p°  =  0  which  is  a  contradic¬ 
tion*.  ■ 

It  should  be  noted  that  there  are  some  pathological 
cases  where  the  rank  condition  of  M?  is  violated.  For 
instance,  if  the  given  one-block  problem  is  in  fact  a 
combination  of  two  or  more  totally  decoupled  subprob¬ 
lems,  then  some  Mfifis  will  have  entire  columns  of  zeros. 
In  most  cases,  however,  the  solution  is  such  that  the  norm 
of  each  row  of  4>°  is  equal  to  p°.  It  is  interesting  to  point 
out  the  analogy  between  this  aspect  of  the  /[-optimal 
solution  of  one-block  problems,  and  the  equivalent  in 
optimization.  In  the  first  one,  the  same  “gain”  is  achieved 
at  all  outputs  while  in  the  second  one  the  same  “gain”  is 
achieved  at  all  frequencies  (i.e.,  inner  solution).  These  are 
direct  consequences  of  the  corresponding  norm  defini¬ 
tions.  Furthermore,  the  analogy  extends  to  the  multiblock 
case  in  the  sense  that  this  property  does  not  hold  in 
general. 

VII.  Multiblock  Problems 

The  exact  solution  of  the  one-block  problem  rests  on 
the  fact  that  the  primal  linear  programming  formulation 
has  only  finitely  many  equality  constraints  (or,  equiva¬ 


lently,  the  dual  formation  has  finitely  many  variables).  The 
multiblock  problem,  however,  is  characterized  by  a  primal 
and  dual  formulation  with  an  infinite  number  of  variables 
and  constraints.  So,  in  principle,  one  can  attempt  to  get 
approximate  solutions  by  an  appropriate  truncation  of  the 
original  problem. 

There  are  basically  two  approximation  methods  re¬ 
ported  in  the  literature.  The  first  one,  known  as  the 
finitely  many  variables  (FMV)  approximation,  was  origi¬ 
nally  introduced  in  [11]  and  further  developed  in  [29],  [34], 
It  results  from  constraining  the  support  of  the  closed-loop 
response  <t>,  thus  providing  a  suboptimal  finitely  sup¬ 
ported  feasible  solution  to  the  problem.  In  the  second 
approach,  known  as  the  finitely  many  equations  (FME) 
approximation  [6],  [33],  only  finitely  many  equality  con¬ 
straints  are  retained  in  the  primal  formulation  of  the 
problem,  the  solution  of  which  is  superoptimal  but  infeasi¬ 
ble.  Its  value  is  complementary  to  the  first  approach  in  the 
sense  that  it  generates  lower  bounds  of  the  optimal  norm, 
P°- 

The  next  two  subsections  give  a  more  detailed  descrip¬ 
tion  of  these  methods  along  with  their  main  characteris¬ 
tics.  They  do  not  contain  new  results. 

A.  The  FMV  Approximation  Method 

Let  N  be  the  order  of  approximation  or  support  of  $, 
then  the  FMV  primal  formulation  is  given  by  the  follow¬ 
ing  linear  program: 

vN  ■=  min  p 
M.f.ATA 7 

subject  to 

»,  N 

Hi)  +  E  E  </>,'U')  +  <t>~ ( k )  =  p  for  /  =  1  ,—,n. 

>=i*=o 

E  E  AW  -  =  b, 

,-=l  ;=  1 

E  =  ^ 

i-i j- i 

(k)  =  ct>~j(.k)  =  0  for  k  >  N 

H  4>tj  >  <t>ij  ^  0*  (33^ 

Note  that  without  the  constraints  <f>fi(k)  =  </>[}(&)  =  0 
for  k  >  N,  (33)  is  equivalent  to  the  full  (untruncated) 
optimization  problem.  Clearly,  the  added  constraints  will 
make  vN  >  p°  in  general.  It  is  yet  unclear,  however,  if  the 
resulting  problem  is  finite  dimensional  or  not,  since  we 
still  carry  an  infinite  number  of  constraints.  A  closer  look 
at  the  matrices  Mtj  will  answer  this  question. 

Recall  that  these  matrices  represent  the  rank  interpola¬ 
tion  conditions  (albeit  some  specific  reordering)  of  the 


DIAZ-BOBILLO  AND  DAHLEH:  MINIMIZATION  OF  THE  MAXIMUM  PEAK-TO-PEAK  GAIN 


1471 


form  (see  Theorem  3.1): 


and 

$*[&ny  + 1  Pn„)=H*^Pny  + 1  Ai„  j 

where  the  results  from  the  right-hand  side  convolutions 
are  collected  in  the  infinite  vector  b2.  The  matrix  repre¬ 
sentation  of  the  convolution  of  the  a’ s  and  /3/s  on  die 
different  entries  of  O,  say  is  precisely  given  by  Mu. 
Therefore,  such  infinite  matrices  will  have  a  band  struc¬ 
ture  inherited  from  the  fact  that  the  a,(A)’s  and  /§/  A)’s 
are  polynomials. 

In  view  of  this  particular  structure^  forcing  $,/A)  =  0 
for  k  >  N  will  make  the  product  eventually 

vanish  for  k  >  N  +  constant,  where  the  constant^  depends 
on  the  order  of  the  polynomials  a(A)’s  and  /3(A)’s.  If, 
however,  the  infinite  vector  b2  is  not  zero  at  that  point, 
then  the  equality  constraints  will  be  violated  for  any  0, 
implying  that  the  added  constraints  have  transformed  the 
feasible  set  into  an  empty  set  and  that  the  linear  program 
has  no  solution.  Furthermore,  this  will  always  be  the  case 
if  b2  has  infinite  support,  no  matter  how  large  N  is 
chosen  to  be.  This  leads  to  the  following  theorem  and 
corollary  (equivalent  results  can  be  found  in  [29]). 

Theorem  7.1:  Given  a  multiblock  problem,  there  exists 
a  finitely  supported  feasible  solution,  (t),  if  and  only  if 
a,  *  H  and  H  *  fy  are  finitely  supported  for  i  =  nu  + 
1, and  j  =  ny  +  nw. 

Corollary  7.1:  Given  a  positive  integer  N,  the  FMV 
problem  (33)  has  a  nonempty  feasible  set  and  therefore  a 
solution,  if  and  only  if  (a*  *  H\k)  =  0  and  (H  *  /3;Xk)  = 

0  for  k>  N  +  constant,  i  =  nu  +  \,--,nz  and  j  =  ny  + 

1  nw,  where  the  constant  depends  on  the  order  of  a, 

and  P,. 

It  is  clear  from  the  above  results  that  there  is  a  class  of 
multiblock  problems  for  which  the  FMV  method  fails 
regardless  of  the  order  of  approximation  N.  Also,  given 
any  multiblock  problem,  there  is  in  general  a  lower  bound 
for  N  under  which  the  FMV  method  also  fails.  A  way  to 
avoid  this  difficulty  is  to  approximate  H  arbitrarily  close 
with  a  finitely  supported  sequence  (e.g.,  PkH).  Such  ap¬ 
proach,  however,  has  the  effect  of  increasing  the  order  of 
the  suboptimal  solution  and  therefore  the  order  of  the 
controller  that  achieves  it. 

Without  overlooking  these  limitations,  we  are  going  to 
assume  for  the  rest  of  this  subsection  that  the  problems  at 
hand  allow  polynomial  feasible  solutions  and  that  N  is 
large  enough  to  capture  at  least  one  of  such  solutions. 

Under  these  assumptions,  it  is  clear  that  all  but  finitely 
many  constraints  in  (33)  are  satisfied  trivially,  so  that  the 
problem  is  in  effect  a  finite  dimensional  linear  program. 
The  next  theorem  shows  that  it  has  nice  convergence 
properties  [11]. 


Theorem  7.2:  In  the  FMV  method,  vN  -»  p°  as  N  -»  *. 
Besides  the  necessary  assumptions  regarding  the  exis¬ 
tence  of  polynomial  feasible  solutions,  the  FMV  approxi¬ 
mation  method  suffers  from  two  other  signigicant  draw¬ 
backs:  1)  Although  it  provices  an  upper  bound  for  p°  and 
a  feasible  solution  that  achieves  it,  it  gives  no  information 
about  how  far  away  from  optimal  the  solution  is,  and  2) 
the  compensators  obtained  with  this  method  suffer  from 
order  inflation  (i.e.,  the  order  of  the  controller  increases 
with  AO.  These  aspects  of  the  solutions  will  be  illustrated 
through  an  example  at  the  end  of  this  section. 

B.  The  FME  Approximation  Method 

The  first  drawback  was  solved  independently  in  [6]  and 
[33]  by  introducing  a  second  optimization  problem,  the 
FME  approximation  method.  Such  method  further  ex¬ 
ploits  the  structure  of  the  matrices  Mi{  to  get  lower 
bounds  on  p°.  The  name  stems  from  the  fact  that  only 
finitely  many  equality  constraints  associated  with  the  rank 
interpolation  conditions  are  included  in  the  optimization 
problem.  The  rest  are  simply  ignored.  Therefore*,  the 
solution  obtained  will  in  general  fail  to  satisfy  those 
constraints  that  were  left  out,  rendering  it  infeasible  to 
the  un-truncated  problem.  A  formal  statement  of  the 
FME  approximation  problem  (in  its  primal  form)  is  as 
follows: 

vs  ■=  min  p 

subject  to 

£(/)  +  E  H  Qtjtk)  +  cbijik)  =  p  for  i  =  !,•••,  nz 

j-lk-0 

E  EM,.(^-^)  =  h1 

.=-1  /- 1 

E  = 

i-1 /”  1  / 

for  k  =  0,"-,  N  —  1 
4>i]  >  0  (34) 

This  truncation  scheme  transforms  the  original  problem 
into  one  with  a  finite  number  of  constraints  but  still  an 
infinite  number  of  variables.  An  argument  similar  to  the 
one  used  for  the  one-block  problem  shows  that  the  above 
infinite  dimensional  linear  program  is  indeed  equivalent 
to  a  finite  dimensional  one.  Let  denote  _the  trun¬ 

cated  Mtj  (i.e.,  the  first  N  rows  of  it).  Since  MijtN  has 
only  a  finite  number  of  rows,  then  the  combined  matrix 

(Mfj  Mis) 

maps  a  finite  dimensional  space  to  /„.  Moreover,  due  to 
the  band  structure  of  Afy,  all  the  columns  of  the  com¬ 
bined  matrix  are  in  c0  and  thus  the  range  is  in  c0. 
Therefore,  by  Lemma  6.1  and  Theorem  6.1,  the  FME 
problem  is  equivalent  to  a  finite  dimensional  linear  pro¬ 
gram  whose  solution  has  finite  support. 
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The  sequence  of  linear  programs  in  (34)  are  such  that 
the  number  of  constrints  increases  with  N.  Therefore,  vN 
forms  a  nondecreasing  sequence  bounded  from  above  by 
fi°.  The  next  theorem  shows  that  it  actually  converges  to 

[34], 

Theorem  7.3:  In  the  FME  method,  vN  -»  /i0  as  N  -»  <*. 

Based  on  these  convergence  properties,  a  multiblock 
problem  can  be  solved  iteratively  to  any  degree  of  approx¬ 
imation  by  solving  two  finite  dimensional  linear  programs, 
corresponding  to  the  FMV  and  FME  truncation  schemes, 
at  each  iteration.  The  stopping  criterion  is  based  on  the 
upper  and  lower  bounds  provided  in  each  iteration.  This 
holds  only  if  there  exits  finitely  supported  feasible  solu¬ 
tions  to  the  problem. 

C.  Delay  Augmentation  Method 

Following,  a  new  method  is  presented  by  the  name  of 
delay  augmentation  (DA).  This  method  provides  a  con¬ 
ceptually  attractive  and  computationally  efficient  way  of 
solving  general  multiblock  problems,  with  the  added  bene¬ 
fit  of  not  requiring  assumptions  on  the  existence  of  poly¬ 
nomial  feasible  solutions  and  with  the  capacity  of  generat¬ 
ing  suboptimal  controllers  without  order  inflation. 

The  main  idea  is  very  simple. 

1)  augment  U  and  V  with  pure  delays  (i.e.,  right 
shifts)  such  that  the  augmented  problem  is  one-block, 

2)  apply  all  the  machinery  developed  for  one-block 
problems  to  the  augmented  system, 

3)  reduce  it  back  to  the  original  system  and  compute 
the  controller. 

In  more  precise  terms,  partition  the  original  system  as 
follows: 


($11  $12  \ 

\$:i  $22/ 


ffn 

n* 

H- 

Q(V,  V,)  (35) 


where  b\  e  /""Xn“  and  V1  e  /"> x">.  Then,  augment  U  and 
V  with  A'th  order  shifts  and  augment  the  free  parameter 
Q  accordingly: 


(  $11. V 

$12.*  \ 

(Hu 

H\2 

lu' 

°] 

\  $21,. V 

$22,  *J 

\h21 

h22 

\u2 

(Qn 

Qn 

\Qn 

Qn 

1° 

5*/ 

(36) 


or,  equivalently, 


$„==//-  UnQnVn  =■  H  —  Rff  (37) 

where  Us,  QN,  and  VN  have  the  obvious  definitions. 
Clearly,  problem  (37)  is  of  the  one-block  class  since  UN  e 
and  VN  e  By  expanding  (36)  we  have 


$*  =  //-  UQnV-SNRN  (38) 


o  .J  0  01G.2  \ 

N'  \QaVx  QnVi  +  U2Q\2  +  sNQnl 


where  the  fact  that  these  are  all  time  invariant  operators 
has  been  used.  With  this  notation  we  are  ready  to  define 
the  delay  augmentation  problem  of  order  N  as  the  follow¬ 
ing  optimization  problem: 

HN  ■=  inf  \\H  -  UsQxVs  \U-  (39) 

It  follows  from  the  above  definition  that  is  a  lower 
bound  for  n°  since 

<  inf  \\H  -  UsQnVs\\x 
Q\i~Qh-Qzi-® 

=  inf  \\H  -  UQUV\\\  =  m°- 
Qnel?‘x"’ 

In  other  words,  the  extra  degree  of  freedom  in  the  free 
parameter  QN  (as  compated  to  Q )  makes  the  construction 
of  superoptimal  solutions  possible.  Such  solutions,  how¬ 
ever,  are  clearly  infeasible  to  the  unaugmented  problem. 
Also,  it  is  interesting  to  note  that  the  extra  parameters 
(namely  Q12,  Q2u  and  Qn)  have  no  effect  on  the  solution 
<t>N(k)  for  k  <  N  due  to  the  presence  of  the  shift  operator 
in  (38).  And  even  more  interesting,  the  term  is  not 
affected  at  all  by  the  added  parameters  (note  the  block  of 
zeros  in  RN).  This  observation  will  let  us  construct  a 
suboptimal  feasible  solution  directly  from  the  solution  of 
(39). 

Given  some  positive  integer  AT,  let 

Mn  =  H$nIIi  =lltf-  UQ°UV  -  SsR°n\U 


then,  clearly 

=  inf  II//  -  UQV I),  <  || H  -  UQ°XV\\\  =■■  JLN. 
Qell-*"' 

(40) 

Or,  equivalently,  the  solution  obtained  by  making  the 
extra  free  parameters  zero  after  solving  (39)  is  feasible 
and  suboptimal  to  the  unaugmented  problem.  The  follow¬ 
ing  lemma  summarizes  these  results. 

Lemma  7.1:  Given  a  positive  integer  N  and  definitions 
(39)  and  (40),  then 


HN  <  <  tiN 


where  JiN  is  achieved  with  Q°x. 

Before  addressing  the  convergence  properties  of  this 
method,  a  word  on  existence  is  in  order.  Recall  that 
existence  is  assured  if  there  are  no  zero  interpolations  on 
the  boundary  of  the  unit  disk.  Now,  it  may  happen  that  a 
multiblock  problem  that  satisfies  this  condition  augments 
into  a  one-block  problem  that  does  not.  Indeed,  notice 
that  the  left  zeros  of  UN  are  given  by  the  left  zeros  of  Ux 
plus  a  multiple  zero  at  the  origin  (due  to  the  block  of 
delays,  A NI,  resulting  from  the  A-transform  of  SN).  Clearly, 
the  left  zeros  of  U  are  also  left  zeros  of  Uv  However,  Ux 
may  have  more  zeros,  possibly  on  the  boundary  of  the 
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disk.  For  example,  let 

f 

'  (A  -  1)  0  \ 

V  (A  —  0.5)  A v ) ' 

At  A  =  1  the  above  matrix  looses  rank,  indicating  the 
existence  of  a  zero  at  the  boundary  of  the  unit  disk. 
However,  reordering  the  outputs  before  augmenting  with 
delays  avoids  this  difficulty: 


|  (A -0.5) 
\  (A  -  1) 


Note  that  the  original  U  has  no  left  zeros  since  the  rows 
are  coprime. 

The  same  applies  to  the  right  zeros  of  V.  In  many 
instances  this  situation  may  be  reversed  by  a  proper 
reordering  of  inputs  and  outputs,  such  that  the  resulting 
{/,  and  Vy  have  no  zeros  on  the  boundary  respectively.  In 
any  case,  this  limitation  has  little  practical  implications 
since  it  is  always  possible  to  find  rational  solutions  to  (39) 
that  are  arbitrarily  close  to  jxs.  In  view  of  this,  we  will 
make  the  following  simplifying  assumption. 

Assumption  2:  UfX)  and  Fj(A)  have  no  zeros  on  the 
unit  circle. 

Note  that  under  this  assumption  the  results  of  Theorem 
3.1  are  applicable.  Furthermore,  in  the  analysis  that  fol¬ 
lows  we  will  be  able  to  exploit  the  existence  of  optimal 
solutions  for  any  N  and  thus  avoid  the  epsilon-delta 
arguments  that  would  result  from  rational  approxima¬ 
tions. 

By  definition,  problem  (39)  is  equivalent  to  the  follow¬ 
ing  primal-dual  pair: 

p.N=  mm\\H-RN\\i=  sup  < H,GN >.  (41) 
Rs^n  GNeS’s 

IIG„||„<;1 


It  is  easy  to  see  that,  as  N  increases,  the  subspace  5^s 
gets  smaller  and  such  that 

^  <42> 

since  the  only  change  in  the  interpolation  structure  is  due 
to  a  higher  multiplicity  of  the  zero  at  the  origin.  There¬ 
fore,  pN  forms  a  nondecreasing  sequence,  bounded  from 
above-by  p°. 

The  next  theorem  states  an  interesting  convergence 
result. 

Theorem  7.4:  Given  the  sequence  there  exists  a 
subsequence  that  converges  weak*  to  some  <5°.  If  the 
optimal  solution  is  unique  then  the  whole  sequence  con¬ 
verges  weak*  to  it. 

Proof:  Gearly  forms  a  bounded  sequence  in 
/^xn.;  then  there  exists  a  weak*-convergent  subsequence 
<i>“,  by  the  Banach-Alaoglu  theorem.  Let  denote 
such  limit  point.  As  mentioned  before,  <5$  is  infeasible  to 
the  original  (unaugmented)  problem.  However,  we  will 
show  that  <t>w'  is  in  fact  feasible.  From  (38),  after  taking 


the  weak*  limit,  we  have: 

<*>"’  =  mUQ°nV)w'  -  (sNR°Nf  =H-  U{Q°nfv 

where  the  superscript  w*  denotes  weak*  limit.  The  last 
term  drops  since  R°N  is  uniformly  bounded  in  N.  For  if 
{R°n}  were  unbounded,  then  {Q°n}  would  necessarily  be 
unbounded  to  keep  pN  bounded.  But  this  contradicts  the 
fact  that  m,v  is  lar8er  than  There¬ 

fore,  <b’v*  is  feasible.  To  show  that  4>'v  is  actually  an 
optimal  solution,  we  need  to  view  3>£  as  a  bounded  linear 
operator  from  c^-xn~  to  R  (i.e.,  boundedjinear  functional 
on  eg--*"*)  with  strong  operator  limit  4>w*.  In  such  context 
we  have  the  following  inequality  (see  [25],  p.  269): 

||cj>**'* ||i  <  liminf||4>^  Hi  <  ll^lli. 

$—*00  * 

Therefore,  since  is  feasible,  all  inequalities  above  are 
in  fact  equalities  and  <t)B'  =  (t>°. 

Finally,  if  the  solution  is  unique  then  the  whole  se¬ 
quence  converges  to  <t>°  weak* .  ■ 

The  last  claim  in  the  above  lemma  simply  reflects  the 
fact  that  if  there  are  several  optimal  solutions,  <t>°,  then  a 
sequence  of  DA  problems  can  be  such  that  (in  the 
limit)  “jumps”  from  one  optimal  solution  to  the  other 
therefore  not  converging  as  a  whole.  Then,  a  subsequence 
that  “keeps  track”  of  a  single  optimal  solution  will  con¬ 
verge  weak*  to  it.  This  technicality  is  unnecessary  when 
the  optimal  solution  in  unique. 

An  immediate  corollary  to  Theorem  7.4  is  the  following. 

Corollary  7.2:  The  sequence  of  lower  bounds,  jmn,  con¬ 
verge  to  fi°  as  N  -*  °°. 

Next,  we  focus  on  the  convergence  properties  of  the 
dual  sequence  GN.  In  the  context  of  (41)  we  state  the 
following  Theorem.  (Note  that  G°N  as  well  as  G°  may  not 
be  unique). 

Theorem  7.5:  Given  the  sequence  Gf,  there  exists  a 
subsequence  that  converges  weak*  in  /g-'xn"  to  an  opti¬ 
mal  solution  G°.  Furthermore,  if  the  solution  G°  is  unique, 
then  the  whole  sequence  converges  weak*  to  it. 

Proof:  Gearly  the  sequence  G°N  is  bounded  by  one. 
Then,  by  the  Banach-Alaoglu  theorem,  there  exists  a 
subsequence  that  converges  weak*  in  /g;Xn*.  Also,  from 
(42)  we  have  that 

^cy;tlc  -  c^. 

Or,  equivalently,  G°N  is  feasible  to  the  original  (dual) 
problem  for  all  N.  Further,  it  can  be  shown  that  the 
feasible  subspace  is  weak*- closed  [11],  [29],  then  G^ 

converges  weak*  to  a  feasible  limit  point,  say  Gw  .  There¬ 
fore, 

p.Ni  =  (H,G°n)  - 

But,  by  Corollary  7.2,  -*  p° ,  thus,  p°  -  ( H,GW  ). 

This  implies  that  Gw*  is  in  fact  an  optimal  dual  solution, 
G°,  since  it  achieves  the  optimal  value  and  is  feasible. 

If  the  solution,  G°,  is  unique  then  the  whole  sequence 
converges  weak*  to  it.  ® 

Next,  we  focus  our  attention  on  the  sequence  of  subop- 
rimal  solutions  that  attain  the  upper  bound  pN.  Let 
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:=  H  -  UQ°UV,  then  =  ll<t>.vlli  by  definition.  It  is 
easy  to  see  that  <t>v  forms  a  bounded  sequence  in  /J’--*"- 
(if  not  <t>^,  N  and  thus  /x.v  would  be  unbounded).  There¬ 
fore,  there  exists  a  subsequence  that  converges  weak*  in 
is  clearly  feasible  to  the  original  problem 
for  any  N,  and  since  5?  is  weak*- closed  [11],  then  all 
weak*  limit  points  are  feasible.  The  question  is  whether 
or  not  the  subsequence  =  !!<!>„,  Hi  converges  to  pf  in 
general. 

In  order  to  give  a  proper  cuswer  to  this  question,  it  is 
useful  to  make  the  following  observation  first  made  in 
[33].  In  Corollary  6.2  we  have  shown  that  most  one-block 
problems  have  optimal  solutions  with  all  row  norms  equal 
to  n°.  To  illustrate  why  this  is  not  the  case  with  multi¬ 
block  problems,  consider  the  following  SISO  example: 

4>i  =  hx  -  uxq 

where  all  operators  are  in  lx  and  u{(  A)  has  no  zeros  on 
the  unit  circle.  Let  d>°  denote  an  /roptimal  solution  to 
such  (one-block)  problem,  that  is  achieved  with  <7°.  Next, 
add  a  new  row  to  the  problem, 


such  that  ||/i2  -  u2q°\U  <  p°  (this  is  always  possible  sim¬ 
ply  by  choosing  a  scalar  weight  on  the  second  row  of  small 
enough  value).  Then,  it  is  clear  that  an  optimal  solution  to 
the  new  two-block  column  problem  is  still  given  by  q°  and 
that  H4>?lli  <  H°\\i  =  P°-  In  other  words,  the  new  row 
does  not  affect  the  optimal  solution  which  is  given  by  the 
first  row  alone.  In  contrast  with  a  one-block  problem  with 
two  outputs,  a  two-block  problem  with  two  outputs  has  to 
minimize  both  outputs  with  just  one  scalar  free  parameter 
sequence,  q.  The  “shortage”  of  degrees  of  freedom  is 
what  makes  this  situation  more  common  in  multiblock 
problems. 

Having  noted  this  behavior,  we  can  present  the  main 
theorem  concerning  convergence  of  the  upper  bound, 
Theorem  7.6:  Given  a  general  multiblock  problem,  let 
converge  weak*  to  an  optimal  solution  <t>°  =  H  - 
UQ°V  such  that  |K^°),lli  =  P°  for  i  e  {1  ,---,nu).  Then, 
converges  strongly  (i.e.,  in  the  norm)  to  as  N  -» °°, 
and  further,  Jln  p°. 

Proof:  It  is  a  well-known  fact  that  if  a  sequence 
xn  e  /,  converges  to  xw *  weak*,  and  if  lUJi  -» \\xw  Hi, 
then  xn  converges  to  strongly.  However,  such  result  is 
valid  only  for  scalar  and  row-vector  sequences  in  f  (it  is 
easy  to  think  of  a  counter-example  in  the  general  matrix 
case).  Therefore,  we  apply  it  to  each  individual  row  of 
to  conclude  the  following:  (<*>£  )*  converges  strongly  (i.e., 
in  the  norm)  to  (O0),  for  all  i  such  that 

lK4>°)(lii  =  „ 

At  the  same  time,  from  Assumption  2,  Ux  and  Vx  have 

full  normal  rank,  so  the  map  from  Qn  to  <PU:N  is 
continuous  with  continuous  inverse,  that  is 

Q\\  =  ^f!  (Al  “  *• 


Then,  using  the  fact  that  IK^0);!!!  P  I°r  !  e  {!>■"> 
we  conclude  that  ^  converges  stronglv  to  which 3 
in  turn  implies  that  Q°\  converges  strongly  to  Q  and  the  ja 

result  follows.  ■ 

The  above  theorem  suggests  that  the  constructionjjf  .3 
the  feasible  solution  that  attains  the  upper  bound,  3>V)  | 
can  be  viewed  as  an  attempt  to  compute  the  weak  limit  | 
of  the  sequence  (t>Jvj  by  “throwing  away  the  tail  con-  v 

tained  in  the  term  SNR°N ■  1 

It  should  be  stressed  at  this  point,  that  nonpathological 
multiblock  problems  have  optimal  solutions  where  at  least 
nu  of  the  n.  rows  achieve  the  optimal  norm  (a  natural 
extension  of  how  optimal  solutions  of  one-block  problems 
behave).  Furthermore,  those  rows  that  do  not  achieve  the 
optimal  norm  can  be  left  out  of  the  optimization  problem 
without  affecting  the  overall  solution,  so  eventually,  the 
problem  can  be  reduced.  In  general,  however,  a  well 
posed  control  problem  will  tend  to  have  none  of  its  rows 
“redundant”,  so  usually  converges  to  fi°  without 
further  considerations.  In  this  context  we  have  the  follow¬ 
ing  corollary  valid  for  two-block  column  problems  of  the 
form: 


Corollary  7.3:  Given  a  two-block  column  problem,  if 
|j<j>|||1  <  fx°  then  4>„  is  the  exact  optimal  solution  for  any 

N. 

Proof:  Follows  immediately  from  the  fact  that  the 
first  block-row  Hx  -  UXQXV  is  independent  of  the  extra 
free  parameter.  That  is, 

*In  =  h  i  -  U&°V 
<t>°  N  =  h2  -  u2q\v  -  snq°2v. 

Then,  for  any  N  we  have 

||4>®  N|li  >  n°  >  Av  =  max(||<l>"  jvlli  J^jvlli) 

Thus,  equality  is  attained  throughout  and  the  result  fol¬ 
lows,  i.e.,  Q°  =  Q°.  * 

Theorem  7.6  and  Corollary  7.3  dictate  that  a  reordering 
of  outputs  needs  to  be  done  so  that  the  first  nu  rows  of 
achieve  the  optimal  norm  /x°.  The  question  is,  then,  how 
to  find  a  priori  which  rows  of  the  problem  are  not  going  to 
achieve  the  optimal  norm.  A  brute  force  answer  to  this 
question  is  simply  to  solve  all  possible  one-block  problems 
formed  by  taking  nu  rows  out  of  the  given  n,  rows.  If  any 
solution  is  such  that  all  the  rows  that  were  left  out  have 
smaller  norm  than  the  corresponding  p° ,  then  those  rows 
are  the  inactive  ones  and  should  be  ordered  in  U2.  (In  fact 
these  rows  can  be  removed  altogether.)  However,  this 
approach  may  require  a  considerable  amount  of  work.  We 
will  return  to  this  difficulty  later. 

Two-block  row  problems,  show  a  similar  behavior.  In¬ 
deed,  such  problems  may  have  columns  that  are  inactive 
in  the  optimization  process  in  the  sense  that  they  can  be 
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removed  without  affecting  the  solution.  Note  that  in  the 
previous  case,  the  phenomenon  of  inactive  rows  was  inti¬ 
mately  related  to  the  fact  that  the  l{  norm  on  matrices 
takes  the  maximum  row  norm,  which  allowed  us  to  easily 
construct  an  example. 

If  the  DA  method  is  applied  to  a  two-block  row  prob¬ 
lem  such  that  the  columns  associated  with  V2  are  inactive, 
then  again  the  solution  <hv  is  exact  for  any  N.  However, 
will  not  give  the  exact  optimal  norm  (although  it  will 
Tend  to  it)  since  the  extra  parameter  contributes  in  reduc¬ 
ing  the  norm  of  <t>°2 N- 

Finally,  let  us  point  out  that  both  forms  of  redundancy 
(row  and  column)  can  occur  in  a  multiblock  problem 
simultaneously.  This  discussion  motivates  the  following 
definitions. 

Definition  7.1:  Given  a  general  multiblock  problem,  a 
one-block  partition  is  defined  by  taking  ny  inputs  and  nu 
outputs  of  the  full  problem,  such  that  the  reduced  prob¬ 
lem  corresponds  to  a  one-block  problem  with  full  normal 
rank  U  and  V. 

Definition  7.2:  In  a  multiblock  problem,  a  one-block 
partition  is  totally  dominant  (TD)  if  the  optimal  free 
parameter  Q°  obtained  from  its  solution  also  solves  the 
original  multiblock  optimization  problem. 

It  follows  from  these  definitions  that,  if  there  is  a  TD 
one-block  partition  corresponding  to  the  partitions  Ux  and 
Vu  then  the  DA  method  provides  the  exact  answer  for  any 
N.  The  next  section  illustrates  some  of  these  properties. 

In  summary,  in  the  DA  method,  pN  always  converges  to 
fi°,  and  converges  to  fi°  if  the  first  nu  rows  are  active. 

VIII.  A  Comparison  of  Methods 

This  section  provides  a  general  comparison  of  the  ap¬ 
proximation  methods  presented,  based  on  a  few  simple 
multiblock  examples.  To  facilitate  further  study,  the  first 
two  selected  problems  are  the  same  as  those  treated  in 
other  references  [11],  [33].  Particular  attention  will  be 
paid  to  two  aspects  of  the  solutions:  first,  the  support 
characteristics  of  the  sequence  of  solutions,  and  second, 
the  order  of  the  suboptimal  controller  they  generate. 

Example  I:  Consider  the  following  two-block  column 
problem:  given  the  SISO  plant  P,  minimize  the  lx  norm  of 
the  weighted  sensitivity  and  complementary  sensitivity, 


two  interesting  cases:  case  a)  where  is  TD  (for  “small” 
p )  and  case  b)  where  both  rows  are  active  in  the  optimiza¬ 
tion  (for  “intermediate”  p).  The  workings  of  Theorem  7.6 
will  be  illustrated  by  reordering  the  outputs  and  forcing 
the  TD  row  to  be  in  the  “wrong”  place. 

The  results  are  presented  in  tables  showing,  for  each  N, 
the  DA  lower  bound  (p.v),  the  DA  upper  bound  (fis) 
and  the  FMV  upper  bound  (Fv).  The  FME  lower  bound 
is  omitted  since  it  is  equal  to  pN  in  this  particular  case.  In 
general,  however,  p..v  converges  faster  than  vN  since  the 
delay  augmentation  method  generates  more  constraints 
than  the  FME  method  for  any  given  N.  These  extra 
constraints  are  the  ones  that  ensure  feasibility  of 
To  illustrate  this  point,  consider  the  following  case: 


where 


P(  A)  = 


W2PK(1  -  PKY 


A(A  -  0.5) 

(A  -  0.1X1  -  0.5 A) 


where  «[(A)  and  «,( A)  are  coprime.  Further,  assume  that 
«,(A)  has  an  unstable  zero  at  A0.  Consequently,  the  FME 
method  generates  the  following  rank  constraints  (note 
there  are  no  left  zeros  of  U ): 

u2  ~  4>i*  U\)(k)  =  (hx  *  u2  —  h2*  ux)Uc)', 

A  =  0,-,V-  1.  (43) 
Now  consider  the  DA  method  of  order  N: 

...  K  °\ 


0.02  ...  0.004p 

'  TTo5T 

Note  that  a  variable  scalar  weight  on  <j)2,  denoted  p,  has 
been  included.  By  adjusting  p,  we  will  be  able  to  generate 


A  "I' 

A 

Let  us  construct  the  left  zero  interpolations  for  this  UN. 
Multiplying  UN  on  the  left  by  (u,  -«k)  we  get  (0  -«1A'V) 
This  implies  that  the  left  zeros  of  UN  are  given  by  the 
zeros  of  Mj  and  a  zero  at  the  origin  of  multiplicity  N. 
Further,  the  directional  properties  of  such  zero  are  cap¬ 
tured  by  the  vector  (w2  -ux).  Therefore,  the  zero  interpo¬ 
lation  conditions  are  given  by  (43)  plus  the  following: 

</>i(A0)  =  hj(  A0). 

Note  that  this  last  constraint  becomes  redundant  as  N  -* 

OO, 

In  this  particular  numerical  example,  however,  both 
lower  bounds  are  equal  due  to  the  fact  that  the  unstable 
zeros  of  «j(A)  are  also  zeros  of  u2(  A). 

Also  included  are  the  support  characteristics  of  $°N  and 
of  the  FMV  solution  along  with  the  order  of  the  subopti¬ 
mal  controllers  that  achieve  the  corresponding  upper 
bounds. 

To  describe  the  support  characteristics  we  define  a 
function,  len(-),  mapping  ln*m  to  Zfm  in  the  following 
way:  given  $  e  /"xm,  then  [len  ($)],,  is  a  nonnegative 
integer  equal  to  the  maximum  k  for  which  fiifik)  is  not 
zero,  plus  one.  Also,  we  denote  the  order  of  a  controller  K 
by  ord(X). 

Case  a):  In  this  case  let  p  =  1  and  keep  the  same 
ordering  of  outputs  as  above  (i.e.,  sensitivity  first).  The 
results  are  shown  in  Table  I.  Clearly,  the  solution  given  by 
the  delay  augmentation  method  is  exact  since  the  upper 
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TABLE  I 

Comparison  of  Methods:  Example  I.  Case  a)  Where 
the  First  Row  is  TD 


N 

AW 

DA 

AW  len(<t>£)r 

ord  (AD 

>'.v 

FMV 
len  (<t>£  f 

ord(AT) 

1 

0.78222 

0.78222 

(3  2) 

2 

— 

— 

— 

D 

0.7S222 

0.78222 

(3  3) 

2 

— 

— 

— 

3 

0.78222 

0.78222 

(3  4) 

2 

1.31912 

(4  4) 

4 

4 

0.78222 

0.78222 

(3  5) 

2 

0.97459 

(5  5) 

5 

5 

0.78222 

0.78222 

(3  6) 

2 

0.87547 

(6  6) 

6 

6 

0.78222 

0.78222 

(3  7) 

2 

0.83292 

(7  7) 

7 

and  lower  bounds  are  equal  for  any  N.  Then,  in  the 
context  of  Corollary  7.3,  the  first  row  corresponding  to  the 
weighted  sensitivity  is  TD.  Indeed,  a  simple  computation 
shows  that  ||<£?||i  =  0.2040  <  ||<£f||  =  0.7822.  Note  how 
the  support  of  the  second  row  of  the  augmented  optimal 
solution  increases  with  N  while  the  first  row  remains 
constant  and  equal  to  the  optimal  of  the  un-augmented 
problem.  Since  the  controller  is  computed  from  the  first 
row  only,  it  is  also  exact  and  constant  as  N  increases.  In 
contrast,  the  FMV  solution  has  increasing  support  on 
both  rows,  thus  generating  a  suboptimal  controller  of 
increasing  order  that  approximates  the  second  order  opti¬ 
mal  controller.  Note  that  for  some  Ans,  the  FMV  problem 
has  no  solution  (indicated  with  a  dash)  since  the  feasible 
set  is  empty. 

Next,  consider  the  same  problem  but  with  the  outputs 
reordered  (i.e.,  the  complementary  sensitivity  in  the  first 
row).  Table  II  shows  how  violating  the  conditions  of 
Theorem  7.6  affects  the  convergence  of  the  upper  bound 
(note  that  the  lower  bound  does  converge  as  shown  in 
Theorem  7.4).  Although  the  upper  bound  does  not  con¬ 
verge,  it  is  interesting  to  note  that  for  N  >  2  the  length  of 
s  (i.e.,  the  weighted  sensitivity)  locks  at  a  value  of  3, 
which  coincides  with  the  length  of  the  optimal  solution. 
This  seems  to  be  a  general  characteristic  of  the  DA 
method  as  we  shall  see  later.  At  the  same  time,  there  is  a 
clear  order  inflation  on  the  suboptimal  controller  due  to 
the  constant  increase  in  the  length  of  d>°  s.  (Note:  FMV 
results  are  not  included  in  Table  II  since  such  method  is 
not  affected  by  reordering.) 

Case  b:  Let  p  =  6  and  place  the  sensitivity  back  in  the 
first  row.  For  this  weighting,  both  rows  are  active  in  the 
optimization  as  shown  by  the  gradual  convergence  of  the 
upper  and  lower  bound  (see  Table  III).  Note  that,  even 
though  the  controller  order  growth  is  comparable  in  both 
methods,  the  support  characteristics  are  quite  different. 
Most  interesting,  the  length  of  remains  equal  to  4 
for  N  >  2  suggesting  the  possibility  that,  by  changing  the 
order  of  the  outputs,  a  low  order  suboptimal  controller 
can  be  computed.  This  is  in  fact  the  case,  as  shown  in 
Table  IV.  (This  procedure  does  not  apply  to  the  FMV 
method  since  the  suboptimal  solutions  obtained  by  this 
method  are  such  that  all  entries  of  are  supported  at 
k  =  N.)  It  is  interesting  how  in  both  cases  a)  and  b),  a 
proper  ordering  of  the  outputs  results  in  a  much  better 
approximation  of  the  solution  (exact  if  one  row  is  TD)  in 
the  sense  that,  after  some  N,  the  sequence  of  suboptimal 


table  II 

Comparison  of  Methods:  Example  I,  Case  a)  Where 
the  Second  Row  is  TD  _ 


N 

.AW 

AW 

len(^)7' 

ord  (AD 

1 

0.22000 

1.1602 

(3  2) 

2 

2 

0.29195 

1.9939 

(4  3) 

4 

3 

0.42826 

3.1464 

(5  3) 

5 

4 

0.55995 

3.9859 

(6  3) 

6 

5 

0.65664 

4.5189 

(7  3) 

7 

6 

0.71550 

4.8077 

(8  3) 

8 

7 

0.74789 

4.9504 

(9  3) 

9 

8 

0.76483 

5.0171 

(10  3) 

10 

15 

0.78159 

5.1878 

(15  3) 

15 

TABLE  III 

Comparison  of  Methods:  Example  I,  Case  b)  Where 
no  Row  is  TD 

A' 

AW 

AW 

DA 

len^F 

ord  (AD 

Lv 

FMV 

len(^)r 

ord  (AD 

1 

0.78222 

1.2243 

(3  2) 

2 

— 

— 

— 

T 

0.79333 

1.2547 

(4  3) 

3 

— 

— 

— 

3 

0.90230 

1.5255 

(4  4) 

5 

1.3191 

(4  4) 

3 

4 

0.99522 

1.0389 

(5  4) 

5 

1.0564 

(5  5) 

4 

5 

1.0015 

1.0105 

(6  4) 

6 

1.0121 

(6  6) 

6 

6 

1.0024 

1.0043 

(7  4) 

7 

1.0044 

(7  7) 

7 

7 

1.0026 

1.0030 

(8  4) 

8 

1.0030 

(8  8) 

8 

8 

1.0026 

1.0027 

(9  4) 

9 

1.0027 

(9  9) 

9 

TABLE  IV 

Comparison  of  Methods:  Example  I,  Case  b)  with 
the  Outputs  Reordered 

N 

iW 

AW 

len(<J>X)r 

ord  (AD 

1 

0.95745 

1.1602 

(3  2) 

2 

2 

0.95745 

1.1602 

(3  3) 

2 

3 

0.98658 

1.0586 

(4  4) 

3 

4 

0.99889 

1.0157 

(4  5) 

3 

5 

1.0019 

1.0053 

(4  6) 

3 

6 

1.0022 

1.0031 

(4  7) 

3 

7 

1.0026 

1.0027 

(4  8) 

3 

8 

1.0026 

1.0026 

(4  9) 

3 

controllers  are  of  fixed  order  and  asymptotically  ap¬ 
proaching  the  optimal  one.  This  is  not  an  isolated  case. 
Many  other  muitiblock  problems  for  which  reliable  nu¬ 
merical  approximations  were  computed  behave  in  this  way 
when  solved  by  the  DA  method.  In  other  words,  given  a 
general  multiblock  problem,  there  seems  to  be  a  one-block 
partition  that  preserves  a  polynomial  optimal  solution, 
and  further,  such  support  structure  is  eventually  captured 
by  the  delay  augmentation  method  for  a  large  enough  N. 
Then,  a  proper  ordering  of  inputs  and  outputs  that  places 
the  one-block  partition  in  the  first  nu  rows  and  ny  columns 
of  $  (corresponding  to  U,  and  F:)  will  generate  a  se¬ 
quence  of  suboptimal  controllers  without  order  inflation. 

These  observations  suggest  that  an  efficient  algorithm 
for  computing  low  order  suboptimal  controllers  can  be  as 
follows:  given  a  general  multiblock  problem, 

Step  1 )  Pick  a  positive  integer  N. 

Step  2)  Solve  the  corresponding  delay  augmentation 
problem. 
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Step  3)  Compute  len(<b£)  and  reorder  inputs  such  that 
the  set  of  nu  X  n,  input-output  pairs  of  minimum  length 
correspond  to  <l>u. 

Step  4)  If  reordering  was  necessary  in  Step  3),  solve  the 
reordered  system  for  the  same  N.  Then,  check  the  differ¬ 
ence  between  the  upper  and  lower  bounds,  i.e.,  fiN  -  pN. 
If  such  difference  is  small  enough  stop,  otherwise  increase 
N  by  one  (or  more)  and  go  to  Step  2). 

In  order  to  illustrate  the  workings  of  such  algorithm  we 
include  a  four-block  example. 

Example  II:  Consider  the  following  2-input-2-output 
four-block  problem  where  the  regulated  signals  are  the 
output  of  the  plant  and  the  control  sequence  (weighted 
with  the  scalar  p),  and  the  input  disturbances  are  a 
disturbance  at  the  plant  output  with  frequency  weighting 
fT[(A)  and  measurement  noise  with  frequency  weighting 
W2(\).  That  is, 

(1  -PK)~1Wl  PK(l-PK)~lW2 

cp  = 

pK(l  -  PK)~  V,  p/m  -  PK)~lW2 

where 


0.4 

1  -  0.6A  ’ 


W2(  A)  = 


1  -  0.75 A 
1  -  0.25A 


p  =  0.1  and  P(  A)  is  the  same  as  in  Example  I.  Then,  the 
results  in  Table  V  are  obtained  by  applying  the  above 
algorithm  starting  with  N  -  3.  For  N  =  10,  the  subopti- 
mal  controller  is  of  order  five  and  achieves  a  norm  that  is 
within  half  a  percent  of  the  optimal.  (The  jump  in  order  is 
most  likely  due  to  convergence  to  another  optimal  solu¬ 
tion.)  In  contrast,  it  can  be  shown  that  the  FMV  method 
has  no  polynomial  feasible  solution  for  any  N  (due  to  the 
way  Wx  and  W2  enter  the  problem).  This  example  shows 
how  the  delay  augmentation  algorithm  can  generate  low 
order  suboptimal  controllers  even  when  the  FMV  method 
has  no  solution. 


DC  Support  Structure  of  Optimal  Solutions 

Here  we  explore  the  support  characteristics  of  the 
optimal  solution  in  more  general  terms.  The  numerical 
examples  in  the  previous  section  suggest  that  it  may  be 
possible  to  infer  the  support  of  the  optimal  solution  by 
observing  how  the  superoptimal  solutions,  evolve  as 
N  increases.  Here  we  make  an  important  step  in  this 
direction  by  showing  that  such  support  structure  is  “hinted 
to”  by  the  support  of  the  sequence  of  superoptimal  solu¬ 
tions. 

We  have  already  shown  that,  given  a  multiblock  prob¬ 
lem,  there  exists  a  subsequence  of  super-optimal  dual 
solutions,  G£,  whose  weak*  limit  point,  G°,  is  feasible 
and  optimal  (Theorem  7.5).  By  exploiting  this  result  in 
combination  with  the  alignment  conditions,  we  will  show 
that  the  finitely  supported  partition  of  the  optimal  solu¬ 
tion  is  eventually  “captured”  by  the  sequence  of  superop¬ 
timal  solutions.  For  that  purpose  we  need  the  following 
well  known  lemma. 


TABLE  V 

Example  II:  Delay  Augmentation  Algorithm 


N 

Av 

DA 

7ZiV  len(<t>v)r  ord(K) 

Comments 

FMV 

ord(K) 

3 

60.453 

102.34 

(> 

S) 

4 

Reorder  inputs 

— 

— 

3 

60.400 

81.161 

(3 

l) 

2 

Keep  order 

— 

— 

4 

64.702 

81.161 

(l 

5) 

2 

« 

— 

— 

5 

68.284 

81.161 

(l 

D 

2 

it 

— 

— 

6 

70.721 

72.850  j 

(« 

A) 

|  5 

n 

— 

— 

7 

70.754 

71.874  | 

c 

33) 

|  5 

m 

— 

— 

8 

70.888 

71.500  | 

6 

10 

9 

15 

)  5 
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Lemma  9.1:  If  a  sequence  GiV  £  /"xm  converges  weak* 
to  G,  then  for  any  positive  integer  L  <  *,  \\PL(GN  -  G)IU 
— »  0  as  N  — »  °°. 

Note  that  the  above  lemma  implies  that  each  individual 
entry  of  GN  also  enjoys  this  convergence  property,  i.e., 
II PL(gijtN  ~  gij) IU  ->  0  as  N  —  a*,  for  all  i  =  1 and 
j  =  1  ,—,m. 

Next,  let  us  review  the  alignment  properties  of  the 
optimal  solutions.  Optimality  implies  that  each  optimal 
solution  to  the  primal  problem  must  be  aligned  with  every 
optimal  solution  to  the  dual  problem.  In  particular,  if  an 
optimal  dual  solution,  G°,  is  such  that 

|g?.(r)|  <  max  ||g?||»  for  all  t  >  T 
t  s/s«. 

then  all  optimal  primal  solutions  are  such  that  <f>°j(t)  =  0 
for  t  >  T.  Note  that,  according  to  the  notation  developed 
in  Section  V,  maxlsysiJ|gy||«>  is  nothing  but  y0°(i).  The 
next  theorem  puts  all  these  pieces  together. 

Theorem  9.1:  Given  a  multiblock  problem,  if  all  optimal 
dual  solutions  are  such  that  \g°f.T)\  =  y$ (i)  for  some 
rez+  and  |g°(f)l  <  y0°(/)  for  all  t>  T  then,  for  every 
L  >  T  there  exists  a  positive  integer  N*  such  that 
0?.  N(t)  =  0  for  T  <  t  <  L  and  for  any  N  >  N*. 

Proof:  (Note:  to  simplify  notation  we  drop  subindexes 
i,  j  and  superindex  ‘o’.)  Given  some  L  >  T,  pick  e  >  0 
such  that 


min  (y0  -  |g(f)l)  =  e.  (44) 

T<tzL 

By  Lemma  9.1,  for  every  L  >  T  there  exists  N*  such  that 

WgN-g)t<l  (45) 
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for  all  ,V  >  N* .  First  we  prove  (by  contradiction)  that 
|gw(r)|  <  y0,  n  for  T  <  t  <  L  and  for  any  N  >  A'  *.  The 
result  then  follows  from  the  alignment  conditions. 

Given  N  >  N*,  assume  that  |gA.(rj)|  =  y„tN  for  T  <  r, 

<  L.  Then,  by  (44)  and  (45), 

e 

y0.A  -  To  s  \gN(0\  -  teM  -  e<2~  €' 
Therefore, 

To  _  y<>.N  >  2  •  (46) 

Next,  consider  the  point  t  =  T.  From  (45)  and  the  fact 
that  lg  v(r)l  <  To.  a’  in  general,  we  have 

To  “  To.*  £  \g(-T)\  ~  \gN^)\  <  ~ 

which  contradicts  (46).  This  implies  that  =  0  for 

T  <  t  <  L  and  N  >  N*  which  is  the  desired  result.  ■ 

In  other  words,  given  the  conditions  of  the  theorem 
above,  and  for  N  large  enough,  there  is  a  “gap”  of  zeros 
(between  T  and  L)  in  <f>°j<N(t)  which  gets  wider  as  N 
increases,  i.e.,  as  L  increases.  However,  T  does  not  change 
for  N  large  enough,  giving  a  clue  on  the  length  of  the 
finitely  supported  entries  of  O0.  The  difficulty  is  that  we 
do  not  have  an  a  priori  estimate  of  how  large  N  has  to  be 
to  capture  T. 

It  is  worth  pointing  out  that  Theorem  9.1  can  be  ap¬ 
plied  to  the  FMV  sequence  of  suboptimal  solutions  too, 
since  the  corresponding  duals  also  have  a  weak*  conver¬ 
gent  subsequence  [34].  However,  there  is  an  important 
difference  in  the  way  the  DA  and  FMV  sequence  of 
solutions  behave,  which  was  pointed  out  in  the  previous 
section.  Indeed,  while  the  FMV  solutions  are  consistently 
supported  for  t  >  L,  the  DA  solutions  are  not.  This 
observation  was  crucial  in  constructing  low  order  subopti¬ 
mal  controller.  We  expand  these  ideas  in  the  following 
section. 

X.  Observations 

This  section  includes  a  few  observations  based  on  a  fair 
amount  of  computational  experience  using  the  delay  aug¬ 
mentation  method  and  on  some  intuitive  ideas  on  the 
problem  of  lx  optimization  in  general.  It  is  by  no  means  a 
formal  or  precise  presentation.  It  is  simply  intended  to 
give  some  lead  into  new  ideas  that  might  open  the  way  to 
finding  the  exact  solution  of  multiblock  problems  in  gen¬ 
eral.  In  particular,  a  conjecture  is  stated,  establishing  a 
stronger  connection  between  the  support  structure  of  the 
optimal  solution  and  the  DA  method. 

Observe  the  way  the  DA  method  works.  It  transforms  a 
general  multiblock  problem  into  a  square  one,  therefore 
generating  polynomial  superoptimal  solutions,  <J>£.  With¬ 
out  changing  the  order  of  inputs  and  outputs,  the  se¬ 
quence  <I>v  will  increase  its  length  as  N  increases.  How¬ 
ever,  it  was  noted  in  previous  examples  that  not  every 
entry  of  <t>°,  increases  its  length  in  the  same  way.  In  fact, 
a  closer  look  at  the  sequence  suggests  that  the  sup¬ 


port  of  some  of  its  entries  stops  changing  after  some  N. 
This  is  exactly  what  happened  in  Example  I.  cases  a)  and 
b),  where  the  support  of  one  of  the  entries  of  <t>N  re‘ 
mained  the  same  after  some  N  regardless  of  the  ordering. 

In  Example  II,  the  pattern  also  occurs  but  for  N>  12 
(not  shown  in  Table  V).  Next,  note  that 
since  that  block  of  the  problem  is  not  affected_by  the 
extra  free  parameters.  Therefore,  for  each  N,  is 

polynomial.  Then,  if  those  entries  of  0“  that  have  con' 
stant  support  after  some  N  are  collected  (by  reordering) 
m  <i>°n  N,  <$>n  N  will  have  constant  support.  Interestingly, 
those  entries  of  constant  support  seem  to  be  always  enough 
to  define  a  one-block  partition  and  therefore  fill  the 
necessary  entries  of  w  Furthermore,  many  multiblock 
problems  seem  to  have  this  property. 

A  multiblock  problem  in  this  class  can  be  viewed  as 
dominated  by  a  one-block  partition.  In  other  words,  there 
is  an  embedded  one-block  problem  that  is  further  con¬ 
strained  by  the  rank  interpolation  conditions.  Such  con¬ 
straints,  however,  are  not  enough  to  change  the  polyno¬ 
mial  nature  of  the  optimal  solution  corresponding  to  that 
partition,  although,  in  general,  they  have  the  effect  of 
increasing  its  order.  With  this  we  extend  the  notion  of  TD 
one-block  partitions  where  the  added  constraints  due  to 
the  rank  interpolation  conditions  were  totally  inactive. 

Definition  10.1:  Given  a  multiblock  problem,  a  one- 
block  partition  is  partially  dominant  (PD)  if  all  lx  optimal 
solutions  are  polynomial  in  the  entries  corresponding  to 
such  partition. 

Clearly,  a  TD  one-block  partition  is  also  PD  but  not 
vice  versa.  Based  on  this  definition  we  state  the  following 
conjecture. 

Conjecture  10.1:  Given  a  multiblock  problem  with  a  PD 
one-block  partition,  there  exists  a  positive  integer  N* 
such  that  the  DA  solution,  4>jJ,  for  N  >  N*  captures  the 
exact  support  of  the  sequences  corresponding  to  the  PD 
one-block  partition.  Furthermore,  since  the  actual  linear 
program  splits  into  the  difference  of  two  positive 
sequences  (<I>!J+  and  X  the  sign  of  the  nonzero  en¬ 
tries  of  the  exact  solution  corresponding  to  the  PD  parti¬ 
tion  is  also  captured.  That  is,  for  any  pair  of  indexes  i,  j ) 
in  the  PD  partition,  and  N  >  N*, 

<t>°(k)  =  0  «  4>°,N(k)  =  0 

4>°(k)  >  0  «  <t>°,N(k)  >  0 

#}(*)  <  0  «  4>°,N(k)  <  0. 

This  conjecture  is  supported  by  a  fair  amount  of  numer¬ 
ical  experiments  covering  the  most  obvious  combinations 
(i.e.,  two-block  row  and  column  problems  and  four-block 
problems  with  different  input-output  dimensions).  At  the 
same  time,  it  is  consistent  with  Theorem  9.1  but  stronger. 
Indeed,  the  conjecture  claims  that  the  superoptimal  solu¬ 
tion  will  not  be  supported  for  t  >  L.  This  conjecture,  if 
proven  correct,  has  interesting  consequences.  To  illustrate 
some  of  the  ideas  involved,  consider  the  following  simple 
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two-block  column  problem: 


and  assume,  without  loss  of  generality,  that  uy  and  u2  are 
polynomials  (this^an  always  be  obtained  by  polynomial 
factorization  of  U).  Further,  assume  that  (hy  h2Y  is  a 
polynomial  feasible  solution  and  that  the  outputs  are 
ordered  such  that  d>1  is  PD.  Then  we  have  the  following 
equality  due  to  the  rank  interpolation  conditions: 

u2$°  -  =  u2hx  ~  uxh2.  (47) 

Assume  that  all  zeros  common  to  ux  and  u2  have  been 
canceled  out  from  the  above  equation.  Clearly,  the  right- 
hand  side  of  (47)  is  polynomial,  and  furthermore,  the  first 
term  on  the  left-hand  side  is  polynomial  since  we  assumed 
that  <j>x  is  PD.  Therefore,  the  second  term  on  the  left-hand 
side  must  be  polynomial.  This  implies  that  two  situations 
are  possible:  either  <t>2  is  polynomial  or  it  has  stable  poles 
that  are  canceled  by  stable  zeros  of  m,. 

This  observation  has  interesting  implications.  On  one 
hand,  there  is  a  class  of  multiblock  problems  with  polyno¬ 
mial  optimal  solutions  that  is  characterized  by  the  ab¬ 
sence  of  stable  zeros  in  ux.  Such  solutions  can  then  be 
computed  exactly  by  either  the  FMV  or  the  DA  method. 
On  the  other  hand,  if  u,  has  stable  zeros  and  <t>°2  is 
infinitely  supported,  the  rate  at  which  4> 2  decays  is  given 
by  a  subset  of  the  stable  zeros  of  u,.  This  information 
could  be  used  to  transform  the  original  problem  into  a 
finite  dimensional  one  for  which  exact  solutions  are  com¬ 
putable.  This  approach  is  currently  under  investigation.  It 
should  be  noted  that  the  above  ideas  can  be  easily  ex¬ 
tended  to  the  general  multiblock  problem. 

Finally,  note  that  if  the  above  conjecture  is  correct,  the 
DA  algorithm  would  automatically  reorder  any  TD  parti¬ 
tion  in  4>u  and  provide  the  exact  answer,  without  the 
need  to  solve  all  possible  combinations  of  one-block  prob¬ 
lems  (see  discussion  after  Corollary  7.3). 

XL  A  Synthesis  Example 

In  this  section,  we  apply  the  DA  method  to  a  specific 
control  problem,  namely,  the  pitch  axis  control  of  the  X29 
aircraft.  The  motivation  for  doing  so  is  two-fold:  first,  to 
illustrate  the  use  of  the  delay  augmentation  method  in  a 
more  realistic  problem,  and  second,  to  have  a  first  look  at 
the  frequency  domain  features  of  an  /roptimal  design 
(albeit  for  one  particular  example).  In  order  to  give  some 
perspective  to  this  presentation,  we  will  compare  the 
characteristics  of  the  lx  design  with  those  of  an  ^  opti¬ 
mal  design. 

It  should  be  stressed,  however,  that  this  particular  con¬ 
trol  problem  was  not  chosen  for  the  purpose  of  demon¬ 
strating  extreme  behaviors  of  the  lx  and  optimal  solu¬ 
tions.  Rather,  it  was  candidly  selected  as  an  interesting 
control  problem  in  general. 


The  X29  aircraft  poses  an  interesting  control  problem 
due  to  its  revolutionary  forward-swept  wing  design.  With 
such  configuration,  the  center  of  gravity  lies  behind  the 
aerodynamic  center  of  pressure,  rendering  the  aircraft 
statically  unstable.  Thus,  a  control  system  has  to  actively 
stabilize  the  aircraft  during  flight. 

We  are  interested  in  designing  a  digital  controller  for  a 
simple  model  of  the  pitch  dynamics  of  the  aircraft.  The 
airplane  has  three  types  of  control  surfaces:  canard  wings, 
flaperons  on  the  main  wings  and  strakes  on  the  tail.  In 
order  to  simplify  the  model,  the  action  of  these  control 
surfaces  are  lumped  into  one  equivalent  actuator  with 
first  order  dynamics.  Similarly,  the  gyroscopes  and  ac¬ 
celerometers  are  modeled  by  an  equivalent  sensor  with 
neglectable  dynamics.  Thus,  the  system  can  be  approxi¬ 
mately  represented  by  the  following  continuous  time  SISO 
plant  [35]: 

(s  +  3)  20 _  (s  ~  26) 

=  (s  +  10)(s  -  6)  (s  +  20)  (s  +  26) 

airframe  equiv.  actuator  overhead 

(48) 

where  s  is  the  Laplace  variable.  The  airframe  factor 
corresponds  to  a  simplified  model  of  the  pitch  dynamics  of 
the  airplane  flying  at  a  low  altitude  and  with  an  air  speed 
of  approximately  0.9  Mach.  The  overhead  factor  lumps 
the  equivalent  low  frequency  phase  lag  introduced  by  the 
dynamics  that  were  neglected  in  deriving  the  reduced 
mode!  (48).  In  particular,  this  all-pass  factor  is  an  approxi¬ 
mate  representation  of  the  collected  phase  lag  of  the 
gyroscopic  sensor  dynamics,  the  actuator  servo  dynamics, 
the  airframe  flexible  modes,  and  the  digital  implementa¬ 
tion  (i.e.,  pre-filter,  zero  order  hold  and  computing  delay) 
corresponding  to  a  sampling  period  At  =  1/30  seconds. 

Consider  the  following  formal  synthesis  problem: 


where  5  is  the  sensitivity  function.  Such  problem  requires 
the  discrete  time  version  of  (48)  and  two  weighting^  trans¬ 
fer  functions.  The  A-domain  model  of  the  plant,  P(  A),  is 
obtained  by  discretizing  (48)  assuming  a  zero  order  hold 
at  the  plant  input  and  a  synchronized  sampling  of  the 
(pre-filtered)  plant  output.  The  weights  are  chosen  as 
follows:  let  Wx  be  a  scalar  equal  to  0.01  and  let  W2(  A)  be 
the  discrete  time  version  of  the  continuous  time  transfer 
function  (s  +  l)/(s  +  0.001)  for  a  sampling  period  At  = 

1  /30.  This  choice  of  weights  reflects  a  trade-off  between 
low  frequency  performance  and  the  control  effort. 

Note  that  a  controller  designed  for  the  discrete-time 
model  of  a  continuous-time  plant  completely  ignores  the 
inter-sampling  behavior  of  the  system.  An  optimal  con¬ 
troller  designed  in  this  way  is  actually  suboptimal  for  the 
original  hybrid  system.  This  notwithstanding,  we  will  cany 
out  the  design  and  comparison  entirely  in  the  discrete 
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domain  (both  for  /,  and  designs),  taking  the  discrete 
time  plant  model  and  weights  as  the  starting  point. 

A.  Computing  an  l  rSuboptimal  Controller 

With  this  problem  set-up  we  are  ready  to  apply  the 
delav  augmentation  algorithm  as  described  in  Section  IX. 
Table  VI  shows  the  sequence  of  results  obtained  in  this 
case,  starting  with  N  =  4.  Note  how  the  length  of  the 
response  corresponding  to  the  weighted  sensitivity  stops 
increasing  after  N  =  7,  suggesting  that  such  row  is  PD. 
For  N  =  80  the  achieved  lx  norm  is  within  one  percent  of 
the  optimal  so  we  stop  the  iteration  process.  It  is  interest¬ 
ing  to  note  how  slowly  the  upper  bound  converges  to  the 
optimal.  This  behavior  is  consistent  with  the  observations 
made  in  Section  X  regarding  the  rate  of  decay  of  0  when 
one  row  is  PD.  Indeed,  if  the  first  row  corresponding  to 
the  weiahted  sensitivity  is  PD,  then  the  rate  of  decay  of 
the  second  row  is  dictated  by  the  stable  zeros  of  fi;(A).  It 
is  easy  to  check  that  such  transfer  function  contains  two 
stable  zeros  that  are  close  to  the  unit  circle.  Then,  if  the 
optimal  second  row  decays  slowly,  the  extra  free  parame¬ 
ter  ( q ,)  corresponding  to  the  DA  solution  will  be  signifi¬ 
cant  even  for  large  values  of  N. 

Next,  we  will  compare  the  time  and  frequency  domain 
characteristics  of  the  lx  suboptimal  design  corresponding 
to  N  =  80  with  an  design.  The  comparison  will  be 
based  on  three  different  aspects  of  the  solutions:  1)  oper¬ 
ator  norms,  2)  frequency  response  characteristics,  and  3) 
time  response  characteristics. 

Table  VII  shows  how  the  lx  and  K  norms  of  the  two 
solutions  compare.  As  expected,  the  design  achieves 
better  norms  while  the  /,  design  achieves  better  /, 
norms.  A  cross  examination  shows  that  both  solutions  are 
fairly  good  in  terms  of  both  measures.  In  fact,  this  does 
not  come  as  a  surprise  in  view  of  the  following  norm 
inequality  [5]  valid  for  any  stable  finite  dimensional  system 

H.eir'q- 

\\H\\^  <  Jp\\H\\\  <  yfpCln  +  DyfqWHWr, 

where  n  is  its  McMillan  degree.  Thus,  minimizing  any  of 
the  two  norms  will  also  “push  down”  the  other  one, 
particularly  in  a  low  order  problem  as  the  one  under 
consideration. 

Next,  let  us  examine  the  frequency  domain  features. 
Both  designs  have  failry  similar  frequency  domain  charac¬ 
teristics  as  shown  in  Figs.  2  and  3.  While  the  lx  design  has 
better  disturbance  rejection  at  low  and  medium  frequen¬ 
cies,  it  overshoots  at  high  frequencies  where  the  norm 
is  achieved.  In  fact.  Fig.  3  shows  that  both  controllers 
have  very  similar  response,  the  only  significant  difference 
being  at  frequencies  close  to  i t/ A t.  An  interesting  differ¬ 
ence,  though,  is  that  the  /,  design  results  in  an  unstable 
controller  while  the  ^  design  does  not.  Finally,  we  com¬ 
pare  the  weighted  and  unweighted  sensitivity  step  re¬ 
sponse  of  both  designs  (Figs.  4  and  5).  Note  how  the 
output  of  the  plant,  y,  converges  to  zero  faster  in  the  lx 
design  than  in  the  design  (Fig.  5).  This  is  a  direct  result 
of  the  smaller  weighted  steady  state  error  in  the  lx  design 


Fig.  2.  Frequency  response  of  S  for  ix  design  (full  line)  and  design 
(dashed  line). 


Fie.  3.  Frequencv  response  of  K  for  /,  design  (full  line)  and  Stg,  design 
(dashed  line). 


TABLE  VI 

X29  Synthesis  Problem:  Delay  Augmentation  Algorithm 


N 

if* 

len  ($.( )T 

ord(A') 

Comments 

4 

3.254 

1256.4 

(10  5) 

11 

Reorder  outputs 

5 

4.024 

7.619 

(5  5) 

6 

Keep  order 

6 

4.045 

5.059 

(5  6) 

6 

7 

4.048 

5.052 

(6  7) 

6 

8 

4.051 

4.652 

(6  8) 

6 

9 

4.051 

4.319 

(6  9) 

6 

10 

4.052 

4.224 

(6  10) 

6 

20 

4.053 

4.196 

(6  20) 

6 

m 

40 

4.053 

4.158 

(6  36) 

6 

H 

80 

4.054 

4.091 

(6  69) 

6 

* 

(see  Fig.  4)  and  the  pole  of  W2  at  0.9999  (almost  a  pure 
integrator). 

XII.  Conclusions 

A  complete  and  comprehensive  study  of  the  general 
/j -optimal  multiblock  problem  has  been  presented.  It  ad- 
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Fig.  4.  Weighted  sensitivity  step  response  for  i,  design  (full  line)  and 
design  (dashed  line). 


TABLE  VII 

Operator  Norm  Comparison  (At  =  1/30) 


II -Ik 

II -Hi 

ord(k) 

jg  design 

2.4 

5.2 

5 

WXKS 

w,s 

2.0 

3  3 

2.4 

52 

lx  design 

3.8 

4.1 

6 

WXKS 

W2S 

Z8 

4.1 

3.6 

4.1 

vances  the  understanding  of  these  problems  both  from  a 
theoretical  and  a  practical  point  of  view. 

The  paper  makes  the  following  contributions: 

1)  The  interpolation  conditions  are  stated  in  a  con¬ 
cise  and  natural  way.  As  a  result  the  general  theory  is 
developed  in  simpler  terms  and  with  a  minimum  number 
of  assumptions. 

2)  Methods  for  computing  the  interpolation  condi¬ 
tions  were  tied  directly  to  matrix  theory. 


3)  Further  insight  was  gained  on  the  structure  of  the 
optimal  solution  which  allowed  us  to  distinguish  between 
different  classes  of  multiblock  problems  (i.e.,  problems 
with  TD  or  PD  one-block  partitions). 

4)  A  new  method  for  computing  suboptimal  (or  opti¬ 
mal  in  some  special  cases)  solutions  was  proposed  that 
exploits  such  structure.  With  this  method,  a  sequence  of 
suboptimal  controllers  can  be  computed  iteratively  avoid¬ 
ing  (for  a  class  of  problems)  the  problem  of  order  infla¬ 
tion.  Each  iteration  requires  the  solution  of  one  finite 
dimensional  linear  program,  and  generates  upper  and 
lower  bounds  of  the  optimal  norm  with  the  proper  conver¬ 
gence  properties.  In  contrast,  previously  known  approx¬ 
imation  schemes  required  the  solution  of  two  linear 
programs  at  each  iteration,  and  generated  suboptimal 
controllers  with  increasing  order.  In  addition,  the  DA 
method  unifies  the  treatment  of  zero  and  rank  interpola¬ 
tions  and  avoids  the  coprime  factorization  of  U  and  V 
(this  was  required  in  previous  work  [29]).  Further,  this 
approach  generates  a  minimal  set  of  constraints  describ¬ 
ing  the  feasible  subspace  [18]. 

5)  A  result  was  presented  relating  the  support  char¬ 
acteristics  of  the  optimal  and  superoptimal  solutions,  fol¬ 
lowed  by  a  stronger  conjecture. 

Several  examples  were  worked  out  to  illustrate  the 
properties  of  the  DA  method.  In  particular,  a  multiblock 
problem  corresponding  to  the  X29  pitch  axis  control  was 
solved.  The  operator  norms  and  frequency  domain  prop¬ 
erties  of  the  solutions  were  compared  with  those  of  a 
standard  K  design.  Although  the  designs  turned  out  to  be 
quite  similar,  some  differences  were  found  at  high  fre¬ 
quencies. 

As  a  final  note,  let  us  point  out  that  there  are  still 
important  open  questions  to  be  answered  in  connection 
with  /;  optimization.  From  a  theoretical  point  of  view, 
stronger  results  regarding  the  support  structure  of  the 
optimal  solution  are  needed.  In  particular,  a  proof  or  a 
counter  example  for  the  conjecture  presented.  As  pointed 
out  before,  proving  such  conjecture  could  provide  the 
insight  to  uncover  the  underlying  finite  dimensional  struc¬ 
ture  that  the  general  multiblock  problem  may  have.  Also, 
the  existence  in  general  of  optimal  rational  solutions  is  an 
interesting  open  question  connected  to  the  above. 

Finally,  a  model  reduction  theory  in  the  context  of  lx 
optimization  would  be  of  significant  practical  value.  Recall 
that  multiblock  as  well  as  one-block  problems  may  have 
high  order  optimal  controllers  (depending  on  the  interpo¬ 
lation  data).  A  straightforward  approach  to  computing 
lower  order  suboptimal  controllers  results  from  restricting 
the  appropriate  entries  of  <5  to  have  fixed  finite  support. 
But  such  approach  may  be  far  from  optimal.  Therefore, 
optimal  model  reduction  techniques  would  be  useful  in 
practical  design. 
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Abstract:  We  consider  the  problem  of  identification  of  linear  systems  in  the  presence  of  measurement  noise  which  is  unknown  but 
bounded  in  magnitude  bv  some  5  >  0.  We  focus  on  the  case  of  linear  systems  with  a  finite  .mpulse  response.  It . s  known  tha the 
Sal ideSation  error  is  related  (within  a  factor  of  2)  to  the  diameter  of  a  K>lZ 

diameter  is  upper-bounded  by  25,  if  a  sufficiently  long  identification  experiment  ,s  performed. 

minimal  length  of  an  rdentification  experiment  that  is  guaranteed  to  lead  to  a  diameter  bounded  by  2 Ko  behaves  hke  2  , 

when  N  is  large,  where  N  is  the  length  of  the  impulse  response  and  /  is  a  positive  funct.on  known  m  closed  form.  While  the 
framework  is  entirely  deterministic,  our  results  are  proved  using  probabilistic  tools. 


Keywords:  Worst-case  identification;  sample  complexity;  bounded  but  unknown  disturbance. 


1.  Introduction 

Recently,  there  has  been  increasing  interest  in  the  problem  of  worst-case  identification  in  the 
presence  of  bounded  noise.  In  such  a  formulation,  a  plant  is  known  to  belong  to  a  model  set  ^  and 
measured  output  is  subject  to  an  unknown  but  bounded  disturbance.  The  objective  is  to  use  inpu  /  tp 
information  to  derive  a  plant  estimate  that  approximates  the  true  plant  as  closely  as  possible  m  some 
induced  norm.  For  frequency  domain  experiments,  algorithms  that  guarantee  accurate  identification  m 
the  JT.  setting  were  furnished  in  [4,5,6,71.  For  general  experiments,  algorithms  that  guarantee  accurate 
identification  in  the  /,  sense  were  suggested  in  [17,18].  These  algorithms  are  based  on  the  Occam 
Razor  principle  by  which  the  simplest  model  is  always  used  to  explain  the  given  data.  The  optimal 
asymptotic  worst-case  error  is  characterized  in  terms  of  the  diameter  of  the  uncertainty  ' 
ail  plants  consistent  with  all  the  data  and  the  noise  model.  Other  related  work  on  the  worst-case 
identification  problem  can  be  found  in  [8,10,11,19],  In  particular,  [10]  presents  a  specific :  experiment 
uses  a  Galois  sequence  as  an  input,  and  shows  that  the  standard  Chebyshev  algorithm  results  in ^an 
asymptotic  error  bounded  by  the  worst-case  diameter  of  the  uncertainty  set.  A  ^^Ith  seauence 
constructed  by  concatenating  a  countable  number  of  finite  sequences,  such  t  a  ^ 

contains  all  possible  combinations  of  ( - 1.  + 1)  of  length  *,  and  so  it  is  nch  enough  to  accurate^  .dent* 
exactly  k  parameters  of  the  impulse  response.  The  length  of  each  sequence  is  clearly  exponemuilm  k 
Finally,  identification  problems  with  bounded  but  unknown  noise  were  studied  m  the  conte 
prediction  (not  worst-case)  in  [12,13].  Other  related  work,  for  nonlinear  systems  can  be  found  m  131 

An  important  result  from  the  work  of  [17,18]  states  that  for  the  model  set  of  all  stable  plants,  arcurate 
identification  in  the  (x  sense  is  possible  if  and  only  if  the  input  excites  all  possi  e  equen 
unit  circle.  This  is  due  to  two  reasons:  the  first  is  that  bounded  noise  is  quite  nch  and  the  second  is  tha 
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minimizing  an  induced  norm  such  as  the  /,  norm  implies  that  the  estimate  has  a  very  good  predictive 
power.  Inputs  with  such  properties  tend  to  be  quite  long,  and  this  suggests  that  the  sample. .of 
this  kind  of  identification  problems  tends  to  be  quite  high,  as  a  function  ot  the  numbers  of  estimated 

narameters  of  the  impulse  response.  x  f 

P  In  this  paper,  we  will  study  the  sample  complexity  (required  length)  of  the  inputs  for  worst-case 

identification  of  FIR  plants,  under  the  tx  norm,  in  the  presence  of  arbitrary  bounded  measurernen 
noise.  It  will  be  shown  that  in  order  to  guarantee  that  the  diameter  of  the  uncertainty  se  is  oun  . 
2K8  where  5  is  the  bound  on  the  noise  and  K  is  a  constant  (larger  than  1)  the  length  of  the  input  must 
increase  like  2Nf(l/K\  where  N  is  the  length  of  the  impulse  response  and  /  is  a  positive  function.  Since 
the  worst-case  error  ’is  at  least  half  of  the  diameter,  these  results  show  that  the  sample  comp  e»ty  is 
exponential  in  N  even  if  the  allowable  accuracy  is  far  from  optimal,  and  capture  the  limitations  of 
accurate  identification  in  the  worst-case  set-up.  We  also  show  that  our  sample /Compkxity  fstunate  « 
tight,  in  the  sense  that  there  exist  inputs  of  length  approximately  equal  to  2  a  ea 

bound  on  the  diameter.  An  interesting  technical  aspect  of  this  paper  is  that  the  existence  of  such  inputs 
is  established  by  means  of  a  probabilistic  argument  reminiscent  of  the  methods  commonly  employed  in 

,n£Other' researchers  have  also  recently  addressed  the  sample  complexity  of  worst-case  identification  In 
a  personal  discussion  with  Poolla  (January  1992),  he  pointed  out  to  us  (specifically  to  Dahleh)  that  the 
optimal  identification  case  had  exponential  complexity,  as  in  the  lower  bound  of  our  Theorem  2.1  W 
have  recently  received  a  preprint  by  Poolla  and  Tikku  [14]  which,  among  other  results,  contains 
exponential  lower  bounds  for  the  sample  complexity  of  suboptimal  identification  o  S!^ei™p  ?  2 

lower  bounds  are  similar  to.  although  somewhat  weaker  than,  the  lower  bound  in  ^  ^  J  ^ 
Chronologically,  the  results  of  [14]  precede  ours,  although  we  didn  t  have  knowledge  of  their  results 
when  writing  our  paper.  Finally,  [14]  contains  some  upper  bounds  but  unlike  our ‘Theorem t  2.2,  th y 
far  from  being  tight.  Also,  while  writing  our  paper,  we  learned  that  Milanese  [9]  ha  f 

similar  to  the  exponential  lower  bound  in  our  Theorem  2.1.  His  report  does  not  con  am  any 
the  case  where  the  error  is  within  a  factor  of  the  optimal. 


2.  Problem  definition 

Let  be  the  set  of  a11  linear  systems  with  a  finite  hnpulse  response  of  length  N.  Any  h  ° ! 

jf  will  be  identified  with  a  finite  sequence  (hx,...,hN)e.VL  .  Let  Un  be  the  set  o  a  l 
lequences  <«,£. ,  such  that  I  u,  I  <  1  for  al.  i,  and  «,  -  0  for  ■  >  n.  Any  demen,  of  U.  w,ll  be  called  an 
input  of  length  n.  Finally,  for  any  positive  number  5,  let  Ds,  called  the  disturbance  set ,  be  the  set  of  all 

infinite  sequences  d  =  {d,f°_ ,  such  that  |  d,  \  <  8  for  all  /.  (unknown) 

We  are  interested  in  experiments  of  the  following  type:  an  input  u  e  (/„  is  app  1 
system  h  and  we  observe  the  noisy  measurement 

(2-1) 

y  =  h  *  u  +  d, 

where  *  denotes  convolution,  and  where  deDs  plays  the  role  of  an  output  disturbance  or^asurement 
noise.  It  is  clear  that,  for  i>N  +  n,  we  have  *  =  <*,,  and  y,  carries  no  useful  information  on  the 

HH^ThTset^hat11 :ontains  all  plants  in  the  model  set  that  are  consistent  with  the  input/ output  data  and 
the  noise  model  is  called  the  uncertainty  set  and  is  given  by 

sN,n(y>  «)  =  {<£  III  y  -  *  u\L<8) 

The  diameter  diam(5)  of  a  subset  S  of  tx  is  defined  by 


diam(S)  =  sup  II  x  —  y  IK- 

x.yeS 
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We  then  define  the  worst  case  diameter  for  a  given  input  uei/,  by 
Dv„(u)  =  sup  sup  diam(5Vn(u  *  <j>  +  d,  u )). 

Any  identification  algorithm  that  lets  its  plant  estimate  be  an  element  of  the  uncertainty  set  has  an  error 
upper-bounded  by  the  diameter  of  the  uncertainty  set.  Besides,  it  is  shown  in  [15,16,17]  that  the  error  ot 
any  identification  algorithm  is  lower-bounded  by  half  the  diameter  of  the  uncertainty  set.  Define 

=  mfDSn(u). 
ueU„ 


It  is  shown  in  [17]  that 
lim  =  23. 


(2.2) 


Thus,  as  the  length  of  the  experiments  increases,  and  with  a  suitable  identification  algorithm,  the 
worst-case  error  can  be  made  as  small  as  twice  the  disturbance  bound  3,  but  no  smaller  than  5.  A 
question  that  immediately  arises  is  how  long  should  n  be  for  the  error  to  approach  25.  We  address  this 
question  by  focusing  on  the  behavior  of  the  diameter  of  the  uncertainty  set,  as  the  inputs  are  a  owe  to 
become  longer. 

Let  us  define 


n*(N)  =  min{/i|  D£n  =  23}. 


(2.3) 


It  is  far  from  a  priori  clear  whether  n*(N)  is  finite.  This  is  answered  by  the  following  theorem  which  also 
serves  as  motivation  for  the  main  theorem  (Theorem  2.2)  of  this  paper. 

Theorem  2.1.  1  For  any  5  >  0  and  N,  we  have  2N~]  +  N  —  1  <n*{N)  <2  +  N  —  1. 


Proof.  We  start  by  proving  the  lower  bound  on  n*(N).  Fix  N  and  let  us  denote  n*(N)  by  m.  Suppose 
that  m  <  °°,  and  let  y,ue  Um,  be  such  that  DNm(u)  =  25.  Let  «e{-l,  l)m  be  defined  by  u,  -  if 
u.  >  0,  and  v, ,  =  - 1  if  u,  <  0.  For  notational  convenience,  we  define  «,  =  0  for  i  <  0.  We  distinguish  two 

C2LS6S! 

(a)  Suppose  that  for  every  <p  e  { - 1,  1}N,  there  eixsts  some  i(<j>)  e  {1, . . . ,  m  -  N  +  1}  such  that  either 

♦  or  -♦  is  equal  to  . »,„,>■  It  is  dear  that  W)  can  be  the  same  for  at  most  (wo 

different  values  of  6.  Since  the  number  of  different  choices  for  4>  is  2  ,  it  follows  that  m  N  +  I  _  t 

which  proves  that  m  >  2N~ 1  +  N  -  1.  . 

(b)  Suppose  now  that  the  assumption  of  case  (a)  fails  to  hold.  Let  tj>  e  {- 1,  U  be  such  that  both  tp 

and  -<f>  are  different  from  (n,+Af_„  vi+N.2,...,vi),  for  ail  i  e  -N+  1}.  Suppose  that  h- 

8<f>/(N-  1).  Then 


l(A  *  u)i\  = 


N 

Lhi 

k- 1 


N-  1 


N 

H  4>kui-k 

k-  1 


(2.4) 


Since  1 4>k  |  =  1  and  |  u,.k  1  <  1,  we  see  that  | l+ku,_k I  <N-  Let  i  be  such  that  N  <iz m  By  our 
assimiption  on  6,  the  signs  of  ui_k  cannot  be  the  same  as  the  signs  of  of  4>k  for  all  k,  neither  the  same 
as  the  signs  of  -<pk  for  all  k,  and  this  leads  to  the  stronger  inequality 


N 

E  <t>k“i~k 

k-i 


<N-  1. 


(2.5) 


1  We  acknowledge  Professor  Poolla  for  pointing  out  an  error  in  the  previous  version  of  this  theorem. 
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We  finally  note  that  for  i  €  (N,  ml  at  least  one  of  the  summands  6kut  k  is  etff 1  “ro’w.hIf!j 
that  (2.5)  is  valid  for  all  /.  Combining  (2.4)  and  (2.5).  we  conclude  that  1  (h  *  «),  1  <  6  for  all  Therefore 
there  exists  a  choice  for  the  disturbance  sequence  d  under  which  the  observed  output  *  u  +  is  equa 
to  zero  at  all  times.  Using  the  same  argument,  we  see  that  if  A  -  -86  AN  -  D.  there  also  exists  anoth 
choice  of  the  disturbance  sequence  for  which  the  observed  output  is  zero  at  all  times. 

We  have  thus  shown  that  it  is  possible  to  observe  an  output  sequence  which  is  identica  y  equa  o  zero 
while  the  true  system  can  be  either  86 AN-  1)  or  -86 AN-  D-  This  implies  that  the  worst  case 

diameter  satisfies 


DVm(«)  >  2  II  86 /(N  —  1)  Hi  >  25. 


(2.6) 


But  this  contradicts  the  definition  of  m  =  n*(N)  and  shows  that  case  (b)  is  not  possible.Thus,  case  (a)  is 
the  only  possible  one,  and  the  lower  bound  has  already  been  established  for  that  case.  The  upper  bound 
follows  easily  by  using  the  input  sequence  proposed  in  [10,17].  Let  «  be  a  finite  sequence  whose  entries 
belong  to  {-1,1}  and  such  that  for  every  $£{-1,1}*  there  exists  some  i($)  such  that  $ - 


is  equal  to  2*  +  N  -  1  [10].  With  this  input,  the  worst  case  diameter  is  equal  to  26. 

Theorem  2.1  has  the  disappointing  conclusion  that  the  worst-case  error  is  guaranteed  to  become  at 
most  28  onlv  if  a  very  long  experiment  is  performed.  In  practice,  values  of  N  of  the  order  of  20  or  30 
often  arise.  For  such  cases,  the  required  length  of  an  identification  experiment  is  prohibitively  long  it  an 
error  guarantee  as  small  as  25  is  desired.  This  motivates  the  problem  studied  in  this  paper:  if  the 
objective  is  to  obtain  an  identification  error  within  a  factor  K  of  the  optima  value,  can  this  be 
accomplished  with  substantially  smaller  experiments?  Theorem  2.2  below  is  equally  disappoin  mgwi 
Theorem  2.1:  it  shows  that  experiments  of  length  exponential  in  N  are  required  to  obtain  such  an  e 
guarantee.  The  exponent  depends  of  course  on  K  and  we  are  able  to  compute  its  asymptotic  value  (as 

increases)  exactly. 


5UUI  mat  lUl  wvoijr  v  i  -  ,  1  , 

,).  Such  a  sequence,  called  a  Galois  sequence,  can  be  chosen  so  that  its  length 

—  I y  ~  .  1  .  ^  o  i — i 


□ 


Theorem  2.2.  Fix  some  K  >  1  and  let 
n*(N,  K)  =  min{n  I  £>,*,„  <  2.0}. 

Then: 

(a)  n*(N,  K)>2n^/K)-1-N+2\N/K}-1. 

(b)  lim.v_il/AO  log  n*(N,  K)=f(l/K).^ 
Here,  f :  (6,  1)  ->  R  is  the  function  defined  by  2 


(2.7) 


/(«)  =  1  +  ( 


1  —  a  \  /  1 -a \  I l+a 

log!  — - —  I  + 


log 


l+a 


(2.8) 


Notice  that  the  function  /  defined  by  (2.8)  satisfies  f(a)  =  1  -  H(j(l L  -  a)),  where  H  is  th  ry 
entropy  function.  In  particular,  f  is  positive  and  continuous  for  a  e  (0,  1).  Before  going  ahead  with  t  e 
main  part  of  the  proof,  we  need  to  develop  some  lemmas  that  will  be  our  main  tools. 

Lemma  2.1.  Let  Xx,  X2,...,XN  be  independent  binomial  random  variables  with  PriX,  =  1)  =  PriAT,  =  - 1) 

=  \  for  every  i.  / 

(a)  Let  u,  e[  - 1,  1],  i  =  1, . . . ,  N.  Then,  for  every  a  e  (0,  1),  we  have 

(2.9) 


Pr(i  !>,*,>«) 


2  In  the  definition  of  /,  and  throughout  the  rest  of  the  paper,  all  logarithms  are  taken  with  base  2. 
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1  (  1  £ 

lim  —  log  Pr  —  Z  Xi  >a  =  -f(a). 
,v-oo  N  \iV1  =  1 


(2.10) 


Proof.  Part  (b)  is  obtained  from  the  classical  Chernoff  bound  [1]  or  from  counting  arguments  [2].  Part  (a) 
also  follows  from  the  Chernoff  bound,  if  u,  =  1  for  all  i.  It  remains  to  prove  part  (a)  for  the  general  case 

of  1].  .  , 

We  first  note  that  because  of  the  symmetry  in  the  distribution  of  Xt,  we  can  assume,  without  any  loss 

of  generality  that  «,  e  [0,  1]  for  all  i.  We  then  have 
Prf-  £  «!,■*,£«)  <  inf  n£[e^--“>]<  inf 

'  *'  “  '  1  —  "  -  s>0  /=  1 


i>0  1 


The  first  inequality  is  obtained  by  following  the  steps  in  the  standard  proof  of  the  Chernoff  bound,  the 
second  inequality  is  obtained  by  verifying  that  eIU  +  e-s“  <  e*  +  e~s  for  all  u  e  [0,  1];  finally,  the  final 
equality  is  a  simple  calculation  which  is  also  part  of  the  classical  proof  of  the  Chernoff  bound.  □ 


One  consequence  of  Lemma  2.1  is  that  for  any  e  >  0,  there  exists  some  N0(a,  e)  such  that 
Pr(4  E  >2-‘W(«a)+'\  VN  >N0(a,  e). 


(2.11) 


The  following  lemma  strengthens  (2.11)  and  will  be  needed  later  in  the  proof. 


Lemma  2.2.  Let  XV...,XN  be  as  in  Lemma  2.1.  Let  0N  =  {(0V . ..,0W)  e  M'v  |Efl,  1 0,1  N}.  Then,  for 

any  et  >  0,  there  exists  some  Nfaef)  such  that 


Pr  -tdiX^a  >  VA^A^aeO.VOea*. 

\N  i-i 


(2.12) 


Proof.  Note  that  the  random  variables  E*  jfyA',  and  Efl  1 1 0, 1  X,  have  the  same  probability  distribution. 
Therefore,  without  loss  of  generality,  we  can  and  will  assume  that  0,  >  0  for  all  i.  We  have 


Pr|  Z  eiXi  >  oAfj  =  Pr  |  L  >  aN  Z  Xt  >  aivj  ■  Pr|  £  X,  Z  aN 

>  2-WA“)+'i/2)  pr(  Z  etXt  >  aN  Z  Xi  ^  j  > 


(2.13) 


where  the  last  inequality  holds  for  all  N  large  enough,  as  a  consequence  of  (2.11). 

Given  any  sequence  X  =  (XV...,XS),  let  Xk  be  its  cyclic  shift  by  k  positions;  that  is,  X  = 
(Xk+l,  Xk+2,...,XN,  Xy,...,Xk).  Let  X?  be  the  i-th  component  of  Xk.  By  symmetry,  the  conditional 
distribution  of  X  and  Xk,  conditioned  on  the  event  EfL  1A'i  >  aN,  is  the  same.  Therefore, 

In  n  \  i  n  I  n  n  \ 

Pr  Ze^ZaN  Zx,^aN  =T7  EPr  L6iXk>aN  ZXi*“N\ 

\i- 1  i- 1  /  ^  k- 1  \i-l  i-1  I 

1  /  N  N  \ 

>  — Pr  3k  such  that  Z  eiXt  ^  aN  Z  x>  ^  aN 
N  \  i-i  i-i  / 


(2.14) 
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The  last  equality  follows  because  if  >  aN,  then 

N  X  X  X 

E  E Gixi  =  E0,E*,^v2, 

jt=ii=i  1=1  i=i 

which  immediately  implies  that  there  exists  some  k  for  which  I;v_10,  Ar,  >a.V 
We  conclude  that  (2.13)  becomes 


Pr(  £  6.X,  >  aN  ]  >  >  2 “w<0>+'l), 

where  the  last  inequality  follows  if  N  is  large  enough  so  that  l/N>2  Ne'/ 2.  □ 

Having  finished  with  the  probabilistic  preliminaries,  we  can  now  continue  with  the  main  part  of  the 
proof  of  Theorem  2.2.  We  will  start  with  the  proof  of  part  (a). 

Lemma  23.  Suppose  that  the  length  n  of  an  input  sequence  u  e  Un  is  smaller  than  2Nf(WK)  '  -N  + 
2f  N/K]  —  1.  Then,  there  exists  some  h  ^  {—K8/N,  K5/N]  such  that  11  w  *  U  < 

Proof.  Let  n  be  as  in  the  statement  of  the  lemma.  We  will  show  the  existence  of  such  an  A  by  showing 
that  a  random  element  of  l-KS/N.  0/N)“  satisfies  lira  .  * IU < «  rilh  voam'  vn*' Indeed, 
let  h  be  such  a  random  element,  under  the  uniform  distribution  on  {  K  /i  ,  / 

N+n  N+n-\X/K\+1 

Pr(  H  m  *  /i  IU  ^  5)  <  E  Pr(K«  *  h)j\ >5)=  E  Pr(  |(«  *  h)>  I  >  8) 

j- 1  j-[N/K]+  i 


<(N  +  n-2fN/Kl  +  l)^max^Pr(|(M  *  Ml  **)■ 


(2.15) 


where  the  equality  on  the  first  line  holds  because  for  j  <  f  N/K  1,  we  have 

KS 


|(m  *  h)j |  = 


E 

i-i 


E  hiui-i 


7-1 

E  */«/-.- 

i  —  1 


-  1  - <5 

1  AT 


and  for  j  >  N  +  n  -  fN/Kl  +  2,  we  have 


1  N 

X 

E  M,-. 

= 

E  M,-< 

1  /—  1 

KS 

<(N-j+n  +  1)—  < 


Nl 

~K 


-  1 


KS 

IE 


<5. 


Furthermore, 

Pr(|(u  *  h)j  1  >  5)  =  Pr 


N 

EM,- i 

i-i 


>5 


1 

-pr  s 


i-1 


>  -|  <2-2"wl/K). 
~  K  ~ 


(2.16) 


\  i  • 

The  last  inequality  follows  from  Lemma  2.1  (a),  because  the  random  variables  A/fc  /KS  are  independent 
take  values  in  {-1,  1),  and  each  value  is  equally  likely.  Combining  (2.15)  and  (2.16),  we  conclude  that 


Pr(  11  u  *  /i  IU  c:  5)  <  2^N  +  n-2  -j;  +  l) 


+  1  U-Wi/K). 


(2.17) 


\  «  —  >  ' 

If  2(N  +  n  -2\N/K}+  l)<2Nf(l/K\  then  the  right-hand  side  of  (2.17)  is  smaller  than  1.  This  implies 
that  there  exists  some  h  e  {-K8/N,  K8/N)n  for  which  ||  h  *  u  IU  <  8.  □ 
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Suppose  now  that  the  length  n  of  the  input  sequence  u  is  as  m  Lemma  2.3,  and  let  the  unknown 
system  h  have  the  properties  described  in  that  lemma.  Since  \{h  *  u), 1  <  8  for  all  i,  there  is  a  choice  o 
the  disturbance  sequence  d  that  leads  to  zero  output.  Consider  next  the  case  where  the  unknown  system 
is  actually  equal  to  -h.  We  also  have  \(-h  *  u)|  <5,  for  all  i,  and  a  zero  output  sequence  is  still 
possible.  Thus,  if  the  output  sequence  is  equal  to  zero,  both  h  and  -h  could  be  the  true  system.  For  any 
identification  algorithm,  the  worst-case  error  will  be  at  least  equal  to  one  half  of  the  distance  of  these 
two  systems,  which  is  \\h\U  =  K8.  In  fact,  the  same  argument  can  be  carried  out  if  h  is  replaced  by 
(1  +  e)h,  where  a  >  0  is  small  enough  so  that  the  property  (1  +  e)\(h  *  «),  l  <5  holds.  We  can 
conclude  that  the  worst-case  diameter  will  be  at  least  2(1 +  s)K8.  We  have  therefore  sho^  tha 
n  <  2W1/k>-i  _  N  +  2\N/K]  -  1,  then  DNn(u)  >  2K8.  Equivalently,  n  (N,  K)  >  2  N  + 

2SN/K]-1,  which  completes  the  proof  of  part  (a). 

We  now  turn  to  the  proof  of  part  (b)  of  the  theorem.  Part  (a)  implies  that  lim 
infv^il/W)  log  n*(N,  K)  >f(l/K).  The  proof  will  be  completed  by  showing  that 

limsup(l/AO  log  n*(N,  K )  <f(l/K). 

N-kx 

To  show  this,  we  have  to  show  the  existence  of  an  input  sequence  u  of  length  close  to  2  that 

results  in  an  uncertainty  set  of  diameter  bounded  by  2K8.  Although  we  are  not  able  to  provide  an 
explicit  construction  of  such  an  input  sequence,  we  will  prove  its  existence  using  a  probabilistic  argument 
We  now  provide  the  details  of  the  construction  of  the  input  sequence  n.  Let  us  fix  some  <p  >  U.  Let 
M(AO  be  the  smallest  integer  larger  than 


M(N)  >2N(f(e+l/K)+2e). 


(2.18) 


For  every  k  e  {1, . . . ,  M(N)},  we  choose  a  vector  w*  =  (i^,...,w&)e{-l,  1}".  The  input  u  is  then 
defined  by 

K-^.U2,...,  «"<*>),  (2’19) 

and  has  length  NM(N). 

Lemma  2.4.  Let  the  input  u  be  constructed  as  in  the  preceding  paragraph.  Furthermore  suppose  that  the 
entries  of  the  vectors  uk  are  independent  random  variables,  with  each  value  in  the  set  {  ,1  emg  equa  y 

likely.  Then,  there  exists  some  N2(e)  such  that 

such  that  \\h\\x-zK8,\\u*  h\\„<,8)<\,  VN^N2(e).  (2.20) 

Proof.  Let  QN  be  the  left-hand  side  of  (2.20).  Notice  that  if  i  is  an  integer  multiple  of  N,  with  i  =  mN, 
we  have 


N 


(u*  /t),  =  ZuThN-j,  i  =  rnN. 
i- 1 


(2.21) 


We  then  have 

Qn  =  Pr(3  h  such  that  ||/i||i  >K5,  II  u*h  ft.  <  5) 

=  Pr(3 h  such  that  ||  h  Hi  =  K8,  II  u  *  h  IU  <  5) 

=  Pr(3 h  such  that  ||  h  Hi  =  N,  ||  u  *  h  | \j<N/K) 


/ 

N 

3 h  such  that  ||  h  Hi  =  N, 

E  « 7hs-j 

<N/K,  m  =  l,...,M(N) 

i- 1 

i 

(2.22) 


where  the  last  inequality  follows  from  (2.21). 
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Let  us  choose  a  finite  subset  jrs  of  jtN  such  that  for  even-  h  with  II  h  Hi  -  .'V.  there  exists  some 
W  satisfying  ||  h'  l|,  =  N  and  II  h-h'  IL  <  e.  In  particular.  can  be  chosen  as  a  subset  of  the  set 
of  all  elements  of  for  which  each  component  is  bounded  by  N  and  is  an  integer  multiple  o  e/.  . 
then  clear  that  Jtf,  can  be  assumed  to  have  cardinality  bounded  by  ((2N+  l)/e)  .  We  then  have 


Pr  3h  e~//N  such  that  ||  h  ||(  =  N, 


N 


E 

7  =  1 


<N/K.  m  =  1 


( 

N 

3  W  such  that 

E  Kh's-i 

<N(e  +  l/K),  m  =  1 . M(N) 

l 

7  =  1 

2N+  1 ) 

/V 

N 

<  j 

,  £ 

1  max  Pr 
h '  e.rf'v 

E  u-fh's-j 

7  =  1 

<N(e  +  l/K),  m  =  . 


(2.23) 


[ 

N 

Pr 

EW-1 

<N(e  +  l/K) 

l 

7=1 

We  provide  an  upper  bound  to  the  probability  in  the  right-hand  side  of  (2.23)  by  applying  Lemma  2._. 
(Here,  u?  and  /Tv_,  correspond  to  Xt  and  9,  in  the  notation  of  that  lemma.)  Indeed,  Lemma  2._  is 
applicable  because  ||  h'  ||,  =  N  and  the  components  of  the  input  are  i.i.d  random  variables,  with  the  same 
distribution  as  the  variables  X,  of  Lemma  2.1.  A  minor  difference  is  that  the  components  of  h  could  be 
negative,  while  in  Lemma  2.2  we  assumed  that  the  components  of  9  are  nonnegative.  Nevertheless,  if  we 
replace  each  component  of  h'  with  its  absolute  value,  the  distribution  of  the  random  variables 
ZN_iuJ'h'W-i  remains  the  same.  We  therefore  conclude  that  there  exists  some  NZ(K,  e)  such  that 

<  i  +  Vm,  VN>  NZ{K,  e).  (2.24) 

By  combining  (2.22),  (2.23),  (2.24),  and  using  the  statistical  independence  of  the  vectors  um,  we  obtain 

QNZ  «2N+  1)  A)N(1  ~ 

<;((2 N+l)/e)N  exp{-M(iV)2-w/<£  +  1/K)+')}  ^  ((2N+  l)/e)‘V  exp{-2^},  (2.25) 

where  the  second  inequality  follows  from  the  fact  (l-l/xY<e  ,  for  every  x  >0,  and  the  last 
inequality  follows  from  the  definition  of  M(N )  [cf.  (2.18)].  It  is  then  easily  seen  that  QN  converges 
zero  as  N  increases,  which  establishes  the  desired  result.  □ 

Lemma  2.4  establishes  that,  if  the  input  u  is  constructed  randomly  as  in  the  discussion  preceding  the 
lemma,  then,  with  positive  probability,  u  will  have  property  P  below: 

P:  if  h  e.<^v  and  II  u  *  h  ||„  <  8,  then  ||/z|li<K5.  (2.26) 

In  particular,  there  exists  at  least  one  u,  of  length  n  =  M(N)N  that  has  property  P. 

Lemma  2J.  If  an  input  u  has  property  P  of  (2.26),  then  DN  n(u)  <  2K8. 

Proof.  We  apply  the  input  u  and  measure  the  output  y  =  h  *  u  +  d,  where  h  is  the  unknown  plant  and  d 
is  the  disturbance  sequence.  Given  the  observed  output  y,  we  can  infer  that  h  be  ongs  to  t  e  se  o 
uncertainty 

Uy,  u)  =  {d>&<vl  II  y-cf>  *  «IU<5}. 

Let  x  and  if  be  two  elements  of  SNn(y,u).  Then,  II  y  x  *  m  ll*  <  5  and  II  y  *  u  IU 
the  triangle  inequality,  we  obtain  ||u  *(^-t^)/2|U<5.  Since  u  has  property  ,  we  concu  e 

3  In  fact,  it  is  easily  seen  that  QN  converges  to  zero  very  rapidly,  which  implies  that  most  u  s  will  have  property  P. 
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\\(x  ~  <A) /2  111  <  KS  or  II  A'  -  111  <  2fCS.  Since  this  is  true  for  all  elements  of  SN<n(y,  u\  the  diameter  of 

S_vJy,  u )  is  at  most  2K8.  □ 

As  discussed  earlier,  if  N  is  large  enough,  there  exists  an  input  of  length  n=M(N)N  that  has 
property  P  and,  by  Lemma  2.5,  leads  to  uncertainty  sets  whose  diameter  is  bounded  above  by  2K8.  It 
follows  that  n*(N,  K)<M(N)N.  Using  the  definition  of  M(N )  [cf.  (2.18)],  we  see  that 

lim  sup(l/iV)  log  n*(N,  K)  <  limsup(l/AO  log  M{N)N <fle  +  -]  +  2s.  (2.27) 

JV_00  N-fCC  '  ' 

Since  Eq.  (2.27)  is  valid  for  all  £  >  0,  and  since  /  is  continuous,  we  conclude  that 

lim  sup(l/A)  log  n*(N,  K)  <f{\/K), 

/v->« 

which  concludes  the  proof  of  Theorem  2.2.  □ 


3.  Conclusions 

This  paper  addresses  issues  in  the  sample  complexity  of  worst-case  identification  in  the  presence  of 
unknown  but  bounded  noise.  Two  main  results  are  furnished:  the  first  is  a  lower  bound  on  the  length  of 
inputs  necessary  to  approximate  N  steps  of  an  impulse  response  to  an  accuracy  within  a  factor  K  of  the 
best  possible  achievable  error.  This  bound  has  the  form  2Nf(1/K),  and  hence  is  exponential  in  N.  The 
second  result  shows  that  this  lower  bound  in  asymptotically  tight,  i.e.  for  large  enough  N,  there  exists  an 
input  of  length  close  to  the  lower  bound  that  allows  the  identification  of  N  steps  of  the  impulse 
response. 
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Abstract 

In  this  paper  we  present  a  new  framework  for  iterative  modeling 
and  control.  We  begin  by  describing  the  unknown  process  with  an 
uncertain  model  whose  parametrization  depends  on  prior  information, 
available  control  design  tools  and  other  modeling  preferences.  This  is 
formally  presented  as  a  model  set  transformation  problem.  The  sec¬ 
ond  step  is  an  iterative  procedure  for  refining  the  uncertainty  set  via 
robust  control  based  model  invalidation  and  can  be  viewed  as  a  sys¬ 
tematic  way  of  efficiently  searching  for  a  controller  delivering  a  certain 
desired  level  of  performance  to  the  unknown  process.  As  a  result,  ei¬ 
ther  the  performance  goal  will  be  met  or  the  entire  uncertainty  set  will 
be  invalidated  in  accordance  with  our  modeling  and  control  method 
prejudice.  An  iterative  scheme  based  on  a  special  model  structure  and 
rank  one  mixed  p  synthesis  will  be  described  in  detail  and  a  specific 
example  will  be  used  to  illustrate  the  ideas. 

1  Introduction 

Over  the  past  decade,  there  has  been  much  research  activity  in  the  area  of 
worst-case,  or  control-oriented  system  identification.  The  motivation  can  be 
attributed  to  new  advances  in  robust  control  theory  which  did  not  interface 
well  with  existing  theory  of  classical  system  identification.  The  main  focus 
of  this  research  has  been  the  design  of  algorithms  that  yield  nominal  models 
along  with  measures  of  uncertainty  which  are  well  suited  for  robust  control 
design  [11,  10.  28.  20.  14].  Unfortunately,  these  worst-case  algorithms  tend 
to  provide  error  bounds  which  are  very  conservative  in  practice  [13]  and  are 
therefore  of  limited  utility.  This  is  one  motivation  for  the  area  of  iterative 
identification  and  control  which  has  recently  gained  attention  in  the  control 
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community.  Several  researchers  have  been  working  on  the  "connections" 
between  identification  and  control  [1.  30.  31.  9.  17.  24.  25.  16.  151.  This 
work  has  typicallv  been  in  the  spirit  of  adaptive  control.  In  other  words,  a 
sequence  of  nominal  models  is  being  identified,  while  a  sequence  of  corre¬ 
sponding  robust  controllers  is  being  designed  for  these  models.  The  hope  is 
that  the  models  are.  in  some  sense,  getting  closer  to  the  unknown  process 
and  the  performance  is  improving. 

The  more  recent  formulation  of  Dahleh  and  Doyle  [4]  uses  an  entirely 
different  philosophy.  Here,  the  goal  is  to  observe  some  experimentally  gen¬ 
erated  finite  set  of  data  and  find  a  controller  that  meets  a  given  performance 
specification  for  the  unknown  process.  The  model  is  thought  of  as  a  tool 
which  is  chosen  based  on  the  designer's  preferences  of  control  design  tech¬ 
niques  and  ways  of  explaining  the  observed  data.  In  this  sense,  the  chosen 
model  parameterization  is  good  only  if  a  controller  designed  for  this  model 
can  also  achieve  good  performance  with  the  unknown  process.  On  the  other 
hand,  if  a  controller  delivers  good  performance  with  the  model,  yet  fails  to 
meet  the  performance  specifications  with  the  unknown  process,  this  model 
is  considered  to  be  a  poor  description  of  the  process  and  should  be  invali¬ 
dated.  In  this  way.  a  conservative  model  set  can  be  effectively  shrunk  until 
the  remaining  elements  can  deliver  a  controller  which  will  achieve  the  de¬ 
sired  performance  specifications  on  the  actual  process,  or  the  whole  set  is 
invalidated. 

In  this  paper  we  develop  this  philosophy  further  and  give  a  concrete 
example  of  an  iterative  scheme  based  on  such  model  invalidation  through 
robust  control  design.  The  next  section  considers  the  modeling  step  at  a  gen¬ 
eral  level.  Section  3  describes  the  general  philosophy  of  an  iterative  scheme 
and  comments  on  the  computation  issues  in  the  general  case.  This  is  fol¬ 
lowed  by  the  development  of  an  iterative  scheme  based  on  a  fixed  pole  model 
(FPM)  and  the  rank  one  mixed  q  synthesis  (ROS)  robust  control  technique. 
Section  4  discusses  the  modeling  step  in  more  detail  and  presents  some  new 
results  in  this  direction.  This  is  followed  by  a  detailed  descripion  of  an  itera¬ 
tive  scheme  based  on  the  FPM  and  ROS.  The  computations  associated  with 
the  ROS  scheme  are  outlined  in  Section  5.1  and  some  worst-case  complexity 
results  are  also  collected  there.  Finally,  a  specific  example  of  the  ROS  based 
iterative  scheme  is  considered  in  Section  7. 
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2  Step  One:  Selection  of  Model  Parameteriza¬ 
tion 

In  general,  the  selection  of  the  model  parameterization  is  a  process  that 
requires  engineering  insight  as  well  as  careful  consideration  of  available  ro¬ 
bust  control  techniques  and  available  information  about  the  process  to  be 
controlled.  Currently  there  are  few  robust  control  methods  available  and 
the  existing  methods  incorporate  only  special  types  of  performance  objec¬ 
tives  and  uncertainty  structures  [6.  5.  29.  23.  7.  12].  All  ol  these  design 
methods  can  accommodate  unmodeled  dynamics  as  uncertainty  and  norms 
of  weighted  transfer  functions  as  design  specifications.  The  mixed-^/  (rank 
one)  synthesis  method  of  Rantzer  and  Megretskii  [23]  can  also  nonconser- 
vatively  accommodate  parametric  uncertainty  models  (having  some  struc¬ 
tural  restrictions  ).  Once  the  desired  model  structure,  or  parameterization 
is  chosen,  the  problem  is  to  efficiently  map  the  prior  information  about  the 
unknown  process  into  a  model  set  having  the  desired  structure.  This  step 
can  be  extremely  difficult  and  so  the  structure  of  the  prior  information  can 
significantly  influence  the  choice  of  model  parameterization. 

This  model  set  transformation  step  can  be  made  more  rigorous  by  defin¬ 
ing  the  original  prior  information  set  to  be  the  set  of  models  consistent  with 
the  priors 

MpiioT  =  {P(a)  :  a  £  Ap  C  M")  (1) 

where  the  prior  information  is  imbedded  in  Av  and  the  functional  depen¬ 
dence  of  P(a)  on  a.  Note  that  any  unmodeled  dynamics  are  also  contained 
in  .Mprior.  Given  the  desired  model  structure  parameterization.  G{9.  A),  the 
goal  is  to  find  the  smallest  e  >  0  and  0O  C  such  that 

prior  C  Mdes  =  {<7(0.  A)  :  9  €  0O,  ||A||oo  < 

This  is  generally  a  very  difficult  problem  to  solve  and  an  approximate  solu¬ 
tion  for  the  special  FPM  case  will  be  developed  later  in  Section  4. 


3  Step  Two:  Inner  Loop 

The  main  objective  of  an  iterative  scheme  in  our  framework  is  to  efficiently 
search  over  the  auxilliary  (model)  space  for  a  controller  which  meets  the 
given  performance  objective  with  the  actual  unknown  process.  The  idea  is  to 
partition  the  model  set  and  then  efficiently  invalidate  subsets  in  the  partition 
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while  searching  for  a  subset  which  yields  a  controller  that  meets  the  desired 
performance  objective  for  the  actual  process.  A  subset  is  invalidated  if  there 
exists  a  controller  which  achieves  a  certain  level  of  robust  performance  for 
this  set  but  fails  to  achieve  the  same  level  of  performance  for  the  actual 
process.  The  key  issues  are  choosing  the  model  structure/parameterization 
and  deciding  how  to  partition  the  mode!  set.  Clearly,  the  termination  of 
such  an  iterative  scheme  can  occur  only  if  performance  can  be  tested  using 
a  finite  duration  signal.  We  now  make  these  ideas  more  rigorous  before 
stating  the  iterative  procedure. 

3.1  Preliminaries 

We  first  establish  some  useful  notation.  The  model  set  (Ad0,  V)  is  assumed 
to  describe  the  unknown  process  Pr.  The  input/output  relations  for  PT 
and  some  plant /noise  pair  ih.d)  G  (Mq.V)  are  written  as  y  —  PTu  and 
y  =  ( h.d)u .  respectively.  The  reason  for  using  this  notation  is  that  the 
development  remains  general,  and  we  are  not  forced  to  write  y  =  h*u  +  d  or 
y  =  h  *  (u  +  d).  for  example.  Let  the  process  augmented  by  the  exogenous 
inputs  w  and  measured  outputs  z  be  denoted  by  P.  Now,  assume  that 
some  controller  K  is  used  to  close  the  loop  around  the  augmented  process. 
The  performance  objective  is  to  minimize  (in  some  appropriate  sense)  the 
measured  output  2  for  some  exogenous  input  w. 

We  assume  that  w  and  2  lie  in  some  spaces  of  sequences  of  p  and  q  dimen¬ 
sional  vectors.  The  relation  for  w  and  2  is  now  some  LFT  of  the  process  and 
controller  and  will  be  represented  as  iFi(P.K).  With  a  slight  abuse  of  nota¬ 
tion  we  will  similarly  represent  the  closed  loop  relation  with  (h,  d)  in  place  of 
P  as  lFi((h.  d).  K)  and  the  set  of  relations  with  (Mq,V)  as  T\({M  o,V),  K). 
Finally,  given  an  exogenous  signal  wo,  assuming  uniqueness  of  solutions  in 
the  closed  loop  system,  we  can  define  a  map  G  which  takes  (wo,P,K)  into 
u0,  the  input  to  the  plant,  which  we  write  as  uo  =  G(wo,  P,  K).  We  next 
establish  the  way  in  which  we  view  performance. 

When  an  engineer  designs  a  control  system  to  achieve  certain  perfor¬ 
mance  objectives,  the  design  is  tested  and  will  seldom  give  satisfactory  per¬ 
formance  on  the  first  try.  The  main  point  is  that  during  the  testing  phase, 
only  a  finite  time  experiment  is  available.  This  means  that  the  engineer  can 
increase  his/her  confidence  by  observing  the  closed  loop  system  for  some 
finite  set  of  exogenous  inputs  thought  to  represent  the  typical  signals  which 
the  system  will  have  to  face  in  the  future.  Following  this  philosophy,  we 
view  performance  in  relationship  to  some  finite  collection  of  finite  duration 
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exogenous  sienais  and  the  metrics  used  on  the  input  and  output  spaces. 

One  can  think  of  the  standard  model  validation  problem  in  the  following 
way.  Given  (M0.P).  u  and  y  -  Pru.  if  (Mo-  P)  is  inconsistent  with  (u.y). 
then  we  say  that  iM0-P)  does  not  adequately  describe  Pr  with  respect  to  the 
input  u.  The  computations  associated  with  this  consistency  test  is  the  main 
focus  of  the  recent  work  on  model  validation  [26.  27.  22].  We  now  show  that 
performing  model  invalidation  based  on  observed  performance  (i.e..  control 
based  model  invalidation)  is  related  to  standard  model  invalidation  using  a 
special  input.  u‘ .  which  is  well  defined. 

Assume  that  a  set  of  exogenous  inputs.  W  =  jin;  G  R*  *p  !  i  G  [l..V/u,]|. 
is  given  and  the  goal  is  to  achieve  \\PfP.  A’ Vtnt||  <  7||wt-||  Vi  £  [1.  M,r\.  The 
choice  of  norms  here  dictates  which  control  design  methodology  one  will 
have  to  utilize  id.  191.  We  propose  the  following  alternative  way  of  thinking 
about  model  validation,  which  is  similar  to  the  idea  in  [4]. 

Definition  3.1  Given  a  model  set  (Af0,P)  and  a  K  which  achieves 
\\Tl((Mo,V).K)wi\\  <  7|Hi  Vi  6  [hMw], 
if  3 j  G  [1,MW]  s.t.  WTiiPJQwjW  >  7||«tj||, 

then  we  say  that  (Mo,P)  does  not  adequately  describe  P  in  view  of  the 
objectives. 

In  other  words,  if  we  can  find  a  controller  K  which  achieves  the  desired 
performance  for  (Mo,V).  yet  does  not  achieve  it  for  the  process  Pr,  we 
invalidate  (Mq.V).  The  following  result  establishes  a  connection  between 
this  viewpoint  and  the  standard  model  validation  problem. 

Lemma  3.2  Given  (Mo-V)  and  W.  let  K  be  a  controller  which  achieves 
m(M0.V).K)wi\\  <  tIMI  Vi  G  [1. Mu,] 

Furthermore,  for  each  i  G  [1,  Mw],  let  u*  =  P,  K )  and  t/,  =  Pu* . 

If  3  {(hi.d,)} C  [Mo,V)  s.t.  y’  =  (hi,di)u‘  Vz  G  [1, Mw] 
then  || Pi(P.K  juyjj  <  7||u;t||  Vi  G  [1,MW]. 

Proof.  Given  >V.  take  any  i  G  [1  ,MW].  During  the  experiment,  the  closed 
loop  system  generates  unique  n’,  y*  and  z*  =  P[(P,  K)w{.  By  assumption, 
y*  =  ( hi,di)u’  (i.e..  (hi,d{)  can  interpolate  u*  to  y* ).  Because  of  uniqueness 
of  solutions,  if  we  consider  (/z,,di)  in  the  loop  instead  of  P ,  the  resulting  z 
must  be  the  same  as  z*  from  the  experiment.  This  means  exactly  that 

P,{P,  K)wi  =  di),  K)wi  (2) 


o 


ana  since  K  was  designed  to  achieve  ||.Fi((Ado,  AD.  A')u.-,j|  <  7||u>,-j|.  it  cer¬ 
tainly  achieves  !i Ti{{h;.d;).  A'lunj!  <  -j|i(;tj|.  In  view  of  equation  2.  it  is  clear 
that  this  implies  \\T[(  P.  A'ltc.ii  <  -■  1 1  1 1 .  □ 

A  picture  showing:  how  these  variables  are  related  in  the  closed  loop  is  shown 
in  Figure  1. 


Z 

y* 


Figure  1:  Closed-Loop  System  Variables  Depending  on  w 


This  lemma  says  that  if  we  cannot  invalidate  the  set  (Mo,V)  w.r.t. 
{u’.y’)  for  any  i  G  [1,MW],  then  we  will  not  be  able  to  invalidate  it  based 
on  observing  the  performance  w.r.t.  W.  Notice  that  the  reverse  implication 
is  not  necessarily  true  because  K  is  not  designed  to  achieve  performance 
only  for  (Mq.V).  Even  if  (Mo,V)  is  not  consistent  with  K  may 

inadvertently  achieve  the  desired  performance  for  some  (h,d)  (Mo.V) 
which  is  consistent  with  ( u~.y~ ). 

The  above  lemma  can  be  easily  extended  to  performance  objectives  such 
as  \\WpTi(P.  A’)tr,j|  <  7||u>,j|.  where  Wp  is  some  weighting  function.  In  fact, 
it  is  easy  to  see  that  the  result  holds  for  any  performance  objective  that  is 
implied  by  the  robust  control  design  method.  In  the  iterative  scheme,  the 
converse  of  this  lemma  is  actually  used  and  is  stated  here  as  a  corollary. 

Corollary  3.3  Given  the  assumptions  of  Lemma  3.2.  if  3 j  6  [1.  Mw\  s.t. 
II W,  A')  Wj ||  >  7||ttfjj|,  then  there  does  not  exist  any  ( h,d )  E  (Ado,  A1) 
s.t.  y*  =  (h,  d)uj. 

This  says  that  if  we  invalidate  the  set  {Mq.V)  based  on  performance,  then 
we  would  have  also  invalidated  it  by  using  the  resulting  (u*,  y“).  It  is  impor¬ 
tant  to  reiterate  the  significance  of  the  unidirectional  implication  in  Lemma 
3.2  and  Corollary  3.3.  This  means  that  even  though  the  observed  perfor¬ 
mance  is  satisfied.  ( u* ,  )  still  may  not  be  consistent  with  {Mq.V).  In  this 
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wav.  the  performance  test  is  only  sufficient  for  invalidation  of  (yVf0,'Z?)  w.r.t. 

:  a'.y‘).  This  is  not  a  deficiency  in  the  procedure,  but  merely  reflects  the 
fact  that  model  invalidation  in  the  sense  of  Definition  3.1  is  only  sufficient 
for  model  invalidation  in  the  traditional  sense.  In  other  words,  as  long  as 
the  controller  we  have  designed  for  (Mq.V)  is  delivering  the  desired  perfor¬ 
mance  for  the  process  P.  we  have  no  reason  to  invalidate  the  set  (Mo-V). 

3.2  Inner  Loop  and  Computations 

The  following  two  additional  assumption  are  needed  since  a  nonconserva¬ 
tive  robust  control  technique  does  not  exist  for  every  type  of  performance 
objective. 

Assumption  1  There  exists  a  robust  control  technique  which  implies  the 
performance  objective  above  (possibly  conservative). 

Assumption  2  We  accept  this  robust  control  technique  in  the  sense  that 
if  it  cannot  come  up  with  a  controller  satisfying  a  given  performance  objec¬ 
tive.  we  assume  that  no  controller  can  satisfy  it.  We  are  now  in  a  position 
to  describe  the  inner  loop  of  the  iterative  scheme  which  we  show  in  the  flow 
chart  in  Figure  2. 

The  computational  difficulty  is  imbedded  in  the  robust  control  design 
step  while  the  efficiency  is  controlled  by  the  partition  of  the  model  set  and 
selection  of  the  candidate  subsets.  Given  the  desired  performance  objective, 
it  may  be  difficult  to  determine  how  small  a  subset  of  the  partition  should 
be  for  the  robust  control  problem  to  have  a  feasible  solution.  This  means 
that  a  practical  scheme  may  be  based  on  further  refining  the  subsets  until 
the  robust  control  problem  is  solved  and  only  then  trying  these  controllers 
on  the  actual  process.  This  is  exactly  what  is  done  in  the  iterative  scheme 
described  in  Section  6. 

This  general  description  allows  for  any  model  structure  and  correspond¬ 
ing  partition  of  the  model  set  to  be  used.  In  accordance  with  the  above 
additional  assumptions,  we  only  require  a  robust  control  technique  which 
implies  the  given  performance  specification  type.  If  the  robust  control  de¬ 
sign  method  is  conservative,  this  will  be  reflected  in  the  conservatism  of  the 
iterative  scheme  in  which  it  is  used.  Finally,  it  is  important  that  a  finite 
partition  is  used  to  insure  termination  in  finite  time.  If,  for  example,  the 
performance  objective  is  too  difficult  and  cannot  be  met  for  any  plant  in 
the  model  set,  one  does  not  want  to  refine  the  partition  ad  infinitum.  There 
should  be  some  chosen  partition  level  at  which  a  subset  giving  no  feasible 
robust  control  solution  will  be  invalidated. 


Partition  model  set  M 


i 

i 


No 


Is  Performance  Satisfied? 


Yes 


STOP 


Figure  2:  Iterative  Procedure 
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4  Fixed  Pole  Model  for  Rank  One  Synthesis 


The  ROS  design  method  was  developed  by  Rantzer  and  Megretskii  in  [23]. 
The  method  is  limited  to  special  model  structures  (i.e..  SISO.  MISO  and 
SIMO  with  real  and  complex  coprime  factor  perturbations),  however,  the 
solution  is  a  convex  optimization  problem.  If  we  also  require  robust  perfor¬ 
mance  such  as  minimizing  a  weighted  sensitivity  (i.e..  ||WpSj|).  the  uncer¬ 
tainty  can  only  be  in  the  numerator  of  the  model  if  the  rank  one  structure 
is  to  be  maintained.  This  will  be  described  in  more  detail  in  Section  5.1. 
The  corresponding  model  will  be  referred  to  as  the  fixed  pole  model  (FPM) 


and  is  given  by 


G(6.  A)  = 


ESS o1  ekz*  -f  wa 


where  HAHoc  <  1.  9  G  0  C  Rm.  W  is  a  stable  and  invertible  weighting 
function,  and  .4(.  )  is  a  stable  polynomial. 

When  the  prior  information  is  given  in  terms  of  uncertain  pole  locations 
or  other  structures  which  are  not  compatible  with  fixed  poles,  the  mapping 
of  these  priors  into  the  appropriate  parameter  set  0  and  a  weighting  function 
W  can  be  difficult.  In  particular,  when  W  =  eW0,  one  would  like  to  compute 
the  smallest  e  >  0  and  the  corresponding  set  0  such  that  the  FPM  set 
contains  all  the  plants  given  by  the  prior  information.  It  can  easily  be 
shown  that  computing  e  is  equivalent  to  computing  the  n-width  of  the  prior 
model  set  [18],  and  finding  the  corresponding  parameter  uncertainty  set  0 
is  also  a  difficult  task.  For  this  model  and  the  iterative  scheme  described  in 
Section  6.  it  is  important  to  choose  W  with  very  little  conservatism  (i.e.,  try 
to  lump  as  much  of  the  pole  mismatch  into  the  parametric  uncertainty  as 
possible).  This  is  true  because  the  iterative  scheme  will  reduce  uncertainty 
in  the  parametric  part,  while  the  W A  part  will  remain  fixed.  We  next 
discuss  the  modeling  step  for  the  FPM  in  more  detail. 


4.1  Approximate  Solution  to  Model  Set  Transformation:  FPM 
Case 

We  assume  that  the  prior  model  set  is  stable  and  of  the  form 

'^4 prior  =  {P(a)  +  WA  :  a  G  Ap,  HAjU  <  1} 

where  P(a)  =  B(z)/{zn  +  a^71-1  H - b  an),  but  it  is  sufficient  to  consider 

just  the  parametric  part  P(a)  since  any  additive  unmodeled  dynamics  can 
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be  added  to  the  FPM  model  set.  Since  we  do  not  know  how  to  solve  the 
exact  n  width  problem  by  finding  the  optimal  fixed  pole  locations  for  .4(3). 
we  use  the  following  approach.  Let  /lc(3)  be  the  denominator  polynomial  of 
P(a.:)  where  a  is  the  center  of  the  set  Ap.  We  now  use 


(Ac(z))lm'nl 

to  represent  the  actual  plant.  This  means  that  we  are  using  basis  functions  of 
the  form  {^rpy}-  We  now  define  .4(3)  =  (Ac(2))Lm/nJ  and  Hk(z)  =  zk/A{z). 
This  means  that  the  new  model  parameterization  is  given  by  the  subspace 
{9T H  :  9  G  Tm}.  This  results  in  the  simplified  problem 

c"  =  sup  inf  i|Pfa)  -  ^r//’||oc  (4) 

a£.4p 


This  problem  is  equivalent  to  finding  the  maximum  deviation  from  the  set 
-^4 prior  to  a  fixed  finite  dimensional  subspace  given  by  HT9.  We  will  need 
the  following  result  for  a  simplification  of  the  above  problem. 

Let  the  state  space  representation  for  6T  H  be 


0T  H(z) 


I  Be 


Ce  I  De 


We  can  assume  WLOG  that  De  =  0  since  the  constant  term  can  always  be 
matched  exactlv.  Furthermore,  in  this  case  we  can  write 


eT H  =  ®kZ 


-m  +  EETo  hkz 


and  then  simply  take 


A$  = 


0 

1  0 

l 

'  0  ' 

0 

0  1 

’  . 

; 

0 

I  ... 

0 

1 

Be  = 

0 

0 

0  ••• 

0 

1 

-ho 

-hx 

hm  —  2 

^771  —  1 

1 

and  C$  —  9T .  Similarly,  for  a  given  a  G  Ap,  let  the  plant  P(a)  (modulo  the 
DC  value)  in  the  prior  model  set  have  a  representation 


Ap{a) 

BP(a)  ‘ 

Cp{a) 

0 
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This  means  that 


the  system  P(a) 

A  i  B 

c !  o 


-0T H  has  the  state  space 

A9  0 

A  ~  0  Ap{a) 


representation 


B  = 


Bft 

Bp{a) 


C  =  i  Cp(a) 


Lemma  4.1  Given  the  definitions  above,  for  any  a  6  .4P, 

inf  || P(a)  -  HJ6 Hoo  =  :  (*)} 

sex™ 

where  the  condition  (*)  is  equivalent  to 


ELY  >  0  s.t. 

(  _ 

AT X A  -  X  AtXB  +  CtD 

BtXA  +  DtC  BTXB+DTD-rI 

1 _ 1 

\ 

\ 

C  °  ] 

I 

/ 

Proof.  The  system  P[a)  -  0T H  has  a  state  space  representation  with  C 
affine  in  0.  It  follows  almost  directly  from  a  theorem  in  Zhou,  et.al.  [32]  that 


||P(a)  -  9TH ||oo  <  7  if.  and  only  if 


3X  >  0  s.t. 


AtXA  +  CTC  -  X  atxb  +  ctd 
BtXA+DtC  BtXB  +  DTD  -  72/ 


This  is  an  LMI  in  A'  and  72,  but  not  in  9  since  this  matrix  inequality  is 
quadratic  in  C.  and  C  is  an  affine  function  of  9.  The  result  follows  after 
performing  a  similar  trick  to  the  one  used  in  Boyd,  et.al.  [3].  □ 


Note  that  this  is  now  an  LMI  in  the  variables  7, 9  and  X.  and  can  be  readily 
solved  using  various  efficient  interior  point  algorithms  given  in  [21]. 

We  can  now  present  an  algorithm  for  computing  e‘  and  the  corresponding 
set  0o  such  that 

Xi prior  C  Mdes  =  {F H  +  e*A  :  9  G  0O,  IIAIloo  <  1} 

The  main  idea  is  to  relax  the  problem  to  one  of  finding  e*  within  a 
small  constant  0  <  p  which  is  chosen  a  priori.  Because  of  certain  continuity 
properties,  this  p  determines  an  e-net  for  the  set  A.  An  LMI  is  then  solved 
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for  each  lattice  point  and  yields  a  tinite  set  of  optimal  0’s  and  e‘s.  In  the 
end.  P  is  bounded  by  q  +  the  maximum  over  the  computed  c’s.  and  the  set 
0o  can  be  taken  as  any  set  which  contains  the  finite  set  of  all  the  optimal 
0's.  We  now  make  this  more  rigorous. 

Assume  that  AP  is  compact.  P(a)  is  stable  for  all  a  G  Api  and  there 
exists  a  finite  constant.  M .  such  that 

sup  |j/h  a  (jjx,  <  .\1  <  x  . 
ue.4p 

Let  a  enter  affinely  into  the  denominator  of  P(a).  We  will  show  a  bit  later 
that  the  function  mapping  a  >—  ||P(a)||oo  is  uniformly  continuous  on  A.  This 
statement  is  equivalent  to  saying  that  for  any  q  >  0  there  exists  a  6m  such 
that  for  all  ai.  a>  G  Av 

||  Piy  -  Pq.^  ||oc  <  1  whenever  !|ai  -  «2||  <  <5* 

This  suggests  the  following  algorithm  for  computing  em  and  ©o- 

1.  Choose  some  0  <  q  «  1  and  compute  the  corresponding  6“  (the  com¬ 
putation  will  be  discussed  shortly). 

2.  Set  up  a  lattice  { a j}jLi  such  that  the  union  of  the  (^'-neighborhoods 
centered  at  the  lattice  points  is  a  finite  cover  for  the  set  A. 

N 

A  C  (J  Bp(aj) 

j=i 

3.  For  each  j  G  [l.iV],  solve  (via  an  LMI) 

e,  =  inf  || P(a3)  -  0r/f[|cc  and  0,  =  argmin  i|P(aj)  -  0T H ||oo 

and  record  the  pairs  {(Cj.02)}- 

4.  Take  P  =  rj  +  maXj{cj}  and  Go  as  the  smallest  hypercube  containing 
This  has  the  following  properties. 

Theorem  4.2  Given  the  above  algorithm  define  e  =  maxj  ej.  Then 
e<  sup  inf  \\P(a)  -  0T H Woo  <  I  +  q 

and 

{P(a)  :  a  G  -4P}  C  H  +  A  :  0  G  {0j}i  ||^||oo  Ik  € 
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Proof.  The  lower  bound  in  the  first  part  is  immediate.  The  upper  bound  is 
proven  by  defining  a ,  to  be  the  maximizing  point  in  Av.  But  then  there  exists 
a  k  G  [1.  jV]  such  that  || at  —  a,||  <  6“  which  means  that  ||P(<ifc)  —  P(a.  )||oo  < 
if.  It  is  easy  to  see  that  f  r  any  0  G  X0  we  have 

||P(a„)  -9TH\\X  =  \\P(a.)-0TH  +  P(ak)- Pla.jWoc 

<  j|P(afc)  —  ^JP||oo  +  ||P(Gfc)  ~  P(a»i||oc 

<  t  fc  i-  n  <  e  4-  n 

The  second  part  of  the  theorem  is  immediate  from  the  above  argument.  □ 

We  now  show  that  the  function  a  •—  ||P(a)||oo  is  indeed  uniformly  con¬ 
tinuous  on  Ap  and  in  the  process,  show  the  steps  necessary  to  compute  6  . 
We  state  the  main  result  in  the  following  theorem. 

Theorem  4.3  Let  AP  C  be  compact  and  the  set  {P(a)  :  a  G  AF}  is  a 
stable  subset  of  H~ c.  Assume  that  P(a )  =  where  Da  is  affine  in  a.  Then 
the  map  a  t—  ||P(a)||eo  is  uniformly  continuous  on  Ap,  meaning 

Vp  >  O.p  G  [1.  oc].36*  s.t.  Va'.a”  G  Ap,  ||a/ — a./,||p  <  6  =>  ||Pa,  —  Pa"||oo  <  V 

Furthermore.  S'  satisfies 


~  '  p  q 

and  the  computation  of  Mj  requires  either  2  mixed  p  analyses  or.  with  a  bit 
more  conservatism.  1  mixed  p  analysis  and  1  LMI  solution. 


Proof.  We  first  define 

Dfie1")  =  1  +  aT where  £(u)  =  [elw  ■■■  einu,\ 

and  collect  a  few  results.  First,  because  of  stability  of  the  set  {P(a)  :  a  G 
Ap},  we  can  compute  an  upper  bound 


sup  sup 

a£Ap  u/€[0,2x] 


1 

|l  +  ar£(w)| 


<  Md  <  oo 


(5) 


and  use  this  to  get 


inf  inf  |1  +  ar£(u;)|  = 

a€Aw6[0,27r] 


_ 1 _ 

SUpa€j4  SUpwe(o,2lr]  I1  + 


>  Md  > 


0  . 
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As  shown  in  ilSi.  the  computation  in  Equation  5  requires  the  solution  of  one 
mixed  p  analvsis  problem.  Next,  using  the  same  argument,  we  can  compute 
supiG4ri  1|^(«  |!!x  V|a  another  mixed  u  analysis  or  by  an  additional  LMI  as 
follows. 


sup  ||P(a>||oc  <  ||Ar|U 


1 


Da 


The  second  quantity  was  computed  in  Equation  5  while  ||Ar||oo  can  be  com¬ 
puted  via  a  simple  LMI  problem.  Whichever  solution  is  used,  let  the  com¬ 
puted  upper  bound  for  supa€/lp  ||P(a)||<x.  be  denoted  by  Mp.  We  next  define 
the  function 


g{a.~\  = 


1 


1 


1  +  _  1  4-  2aTPe{/p)}  +  ar£(u>)£‘(~  »a 

which  we  can  now  differentiate  w.r.t.  ak  for  all  k  £  [1.  rcj  to  get 

dg{a.ui )  2  cos(Axj)  +  2Re{e~,kul  aT  £(u>)} 

dak  (1  +  2aT  Re{£(bj)}  +  arf(w)f*(u>)a)2 

We  can  use  implicit  differentiation  to  show  that 

|1  +  aT£(u;)|  dg(a.u;) 


l+ar£(u;) 


dak  2 

and  combine  everything  above  to  get 


day . 


d\P{a)[elJJ)\ 

\N(e'“)\\l  +  aTa*)\ 

dg(a.<j\ 

dzk 

2 

dak  \ 

N(e'“) 


1  +  aT£(u) 

<  <  MPM2D(l  +  \\a\\pn1/‘1) 

<  MP Ml ( 1  +  nl/qMA )  <  Mt 


|  cos{ku)  +  Re{e  tfa‘,ar^(q;)}[ 
|l  +  ar£(u;)|2 


(6) 


(7) 


where  MA  =  supag/4  ||a||p  is  bounded  since  A  is  compact. 

We  can  use  this  bound  as  follows.  First,  recall  the  multivariable  mean 
value  theorem.  If  a  function  /  :  Kn  i—  ffi  has  bounded  derivatives,  then  for 
any  x.  y  €  we  have 


f{y)  ~  f{x)  =  V£/  -{y  -  x) 
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for  some  x  =  tx  +  (l-t)y,  t£  [0. 1]  (i.e..  jjy  -  £||  <  || y  -  j||).  This  means 
that  if  | df/dxk\  <  L  for  all  k  E  [l.nj  . 

I f(y)  -  fix) I  <  nl'qL\\x  -  y\\p  . 

We  can  use  this  fact  to  show  that  for  any  w-  E  [0. 2?r]  and  any  a' .a”  E  Ap, 
we  have 

\P,Ai(e  ~  )-  P.'iit  ")!  <  nll‘4\lf\\a  —  a  lip 
which  obviously  implies  that 

II Par  -  Pa"  11=0  <  n^MrWa'  -  a"\\p 

This  means  that  one  can  choose  6*  <  ri/(n1/q Mj)  and  this  completes  the 
proof.  □ 


5  Rank  One  Mixed  p  Synthesis 

In  this  section  we  briefly  review  the  rank  one  synthesis  (ROS)  result  of 
Rantzer  and  Megretski  [23]  and  specialize  it  to  the  fixed-pole  model  (FPM). 


5.1  Fixed-Pole  Model  as  Perturbed  Coprime  Factors 


We  first  show  that  the  fixed-pole  model  (FPM)  having  a  hypercube  for  the 
parameter  uncertainty  set  is  a  special  case  of  the  perturbed  coprime  model 
(PCM).  The  PCM  is  the  model  used  in  the  rank  one  synthesis  theory  and 
is  of  the  form  (SISO  case) 


.V  +  STX,  n-  AN A 
Gs'a  ~  M  +  STMf  +  AMa 

with  N,M  E  RH* 0,  ( N,M )  coprime.  6  E  Km,  Plloo  <  1,  A  E  RH <», 
||A||oo  <  1.  Ns,  Ms  E  RH£,  NA,MA  E  RH* ot  Ms  =  0,  and  MA  =  0.  The 
fixed-pole  model  we  want  is  given  by 


Gb,  a 


m 

A 


+  WA 


where  B{6)  =  T.k^hZk,  9  e  0  C  Em,  0  is  a  hypercube  which  is  centered  at 
9C  and  has  side  lengths  {r)k},  and  ||A||  <  1.  This  corresponds  to  the  PCM 
model  above  with  M  =  1.  N  =  B{9C)(A ,  XA  =  W,  MA  =  0,  Ms  =  0.  and 

,r  [VO  Vi*  ■  ■  '  rlm-lZm~1]T 

NS  =  - 77- - 
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We  now  incorporate  the  robust  performance  objective  which  is  given  by 
HHjA'Iloc  <  This  can  be  transformed  into  a  robust  stabilitv  problem  as 
follows.  Define  G,  =  GT.a(  1  -  Ap\Vv)~l  which  can  also  be  expressed  in 
terms  of  the  uncertain  coprime  model  as 

_  n  +  *tns  +  ±n±  +  apn±p 

a  M  +  6TM$  +  A  +  A  pM±p 

with  ArAp  =  0.  M±p  =  H7p,  and  the  rest  of  the  quantities  defined  as  above. 
The  goal  is  to  find  the  largest  ~._1  such  that  Ga  can  be  stabilized  for  all 
||A|[  <  1  and  i j A ^ j j  <  *  _1.  Figure  3  shows  the  model  and  makes  the  rank 
one  structure  apparent. 


Figure  3:  Rank  One  Structure  of  the  Model 


5.2  Solution  of  the  Rank  One  Synthesis  Problem 

The  general  rank  one  synthesis  result  was  solved  by  Rantzer  and  Megret- 
ski  [23]  who  derived  a  convex  parameterization  of  all  robustly  stabilizing 
controllers  for  rank  one  uncertainty  lying  in  a  convex  set.  We  will  state  this 
result  in  the  form  specialized  for  the  FPM.  Let  the  nominal  model.  B(9C)/A. 
be  denoted  by  G  and  define 

.  j)  -  su  _ |Wp([a  +  J](7+  Qj  |(e.wj _ 

uJjoL]  Re{a(e‘“)}  -  ||£e{ Ns[a  +  /?]}(e.w) ||rf  -  \NA(a  +  J)|(e<w, 

where  a  is  a  positive  real  transfer  function  and  3  is  any  stable  transfer  func¬ 
tion.  The  robust  control  performance  problem  described  in  the  preceding 
section  is  equivalent  to  minimizing  the  functional  d>(a  +  0)  over  a  and  0. 
Although  it  is  a  tedious  exercise,  one  can  show  that  the  above  functional 
is  indeed  quasiconvex  in  (a, 3).  The  denominator  being  nonnegative  is  also 
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a  convex  constraint  and  so  we  get  a  quasiconvex  optimization  subject  to  a 
convex  constraint.  An  e-optimal  solution  can  be  found  in  finite  time  using 
standard  methods  for  convex  optimization  12]. 


6  FPM  Step  Two:  Inner  Loop 

Having  mapped  the  priors  into  a  FPM  set  and  having  a  solution  to  the 
scheme  robust  control  problem,  one  can  try  using  an  iterative  based  on  the 
FPM  and  ROS  with  a  performance  objective  which  is  implied  by  keeping 
the  weighted  sensitivity  transfer  function  small.  We  now  describe  such  a 
scheme  in  detail. 

The  partitioning  will  be  performed  with  respect  to  0  while  W  A  is  as¬ 
sumed  to  represent  the  inherent  nonparametric  uncertainty  and  will  remain 
fixed  in  size.  Thus,  we  will  refer  to  0  as  the  model  set  and  suppress  the 
VP  A  part  which  is  fixed  for  each  parameter  value  in  0. 

The  iterative  procedure  based  on  ROS  consists  of  the  following  steps. 

1.  Label  the  initial  model  set  ©o  and  set  k  =  0. 

2.  Can  the  desired  performance  be  achieved  for  ©it  by  some  A*?  If  yes, 
go  to  (4). 

3.  Refine  ©t  in  the  following  way  (to  achieve  better  performance): 

(a)  Find  j  such  that  the  performance  is  most  sensitive  with  respect 
to  the  parameter.  0.. 

(b)  Split  0fc  along  the  dimension,  resulting  in  the  two  sets  Xq 
and  A'i,  with  0;-  =  A'o  U  X\. 

(c)  (Skip  if  k  =  0)  If  A'o  is  smaller  than  the  smallest  allowable  par¬ 
tition  size  we  invalidate  0fc  by  decrementing  k  by  1,  and  go  to 
(2). 

(d)  Find  q  6  {0,1}  such  that  the  best  performance  which  can  be 
achieved  for  Xq  is  better  than  the  one  for  Xi-q.  Let  Afc+i  be  the 
controller  which  delivers  this  performance  to  Xq. 

(e)  Set  0t  =  Ai_7,  0fc+1  =  Xq,  increment  k  by  1.  and  go  to  (2). 

4.  Connect  A';-  to  the  plant  and  test  for  performance 

5.  If  the  performance  is  satisfied,  stop. 
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Figure  4:  2D  Iteration  Example 


6.  If  k  >  0  invalidate  0t  by  decrementing  k  by  1  and  go  to  (2).  Otherwise, 
choose  a  new  model  parameterization  and  go  to  (1). 

This  procedure  has  several  nice  properties.  As  discussed  in  Section  3.2. 
choosing  the  smallest  allowable  partition  size  to  be  nonzero,  we  are  guar¬ 
anteed  termination  in  finite  time.  Every  time  a  set  is  split,  the  memory 
requirement  is  only  increased  by  one  unit  (containing  the  center  and  side 
lengths  information,  for  example)  so  there  is  no  geometric  or  exponential 
explosion  in  required  memory.  The  search  is  optimistic,  always  seeking  the 
best  set  in  the  partition.  At  first  thought  it  seems  that  this  may  poten¬ 
tially  exhibit  very  bad  worst-case  behavior.  For  if  the  only  controller  which 
achieves  the  performance  for  the  actual  process  is  one  that  is  designed  for 
a  “bad”  set.  the  "good”  sets  will  have  to  be  invalidated  first.  However,  the 
“good”  sets  will  be  invalidated  quickly  because  they  will  typically  be  larger 
and  will  not  need  to  be  split  as  manv  times  as  the  “bad”  sets.  The  following 
figure  illustrates  how  the  iterations  might  proceed  in  the  case  when  0  has 
dimension  2.  In  this  example,  the  shaded  box  4  is  invalidated,  the  counter 
is  decremented  from  4  to  3.  and  the  procedure  resumes  by  focusing  on  box 
3. 

The  computationally  difficult  steps  are  steps  2.  3d,  and  possibly  3a.  Note 
that  in  steps  2  and  3d,  we  are  trying  to  synthesize  controllers  meeting  either 
the  desired  or  the  best  possible  performance  levels,  with  step  3d  having 
to  solve  two  such  problems.  Step  3a  which  computes  the  sensitivity  of 
performance  with  respect  to  each  parameter  is  fairly  easy  to  compute  in 
the  special  case  of  ROS  and  FPM.  The  solution  is  given  by  the  following 
result. 
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Lemma  6.1  Assume  that  the  ROS  solution  gives  a  feasible  pair  (al  z  j.  J(~)) 
as  well  as  the  worst  frequency,  u-o  which  maximizes  the  functional.  Then  the 
parameter  which  has  the  greatest  impact  on  performance  is  given  by  #fcmax.. 

3(e**)  +  a(e^)\l  (g) 

A(eiuj° )  Jj 


kmax  =  arg  max  Re 
fce[0,n-l] 


Proof.  Given  the  feasible  pair  (a(z).5(z))  and  the  worst  frequency,  Wo,  © 
can  be  viewed  as  a  function  depending  only  on  the  pk's.  Thus,  we  can  write 


0(1 )  = 


a  o 


m  -  ||JRe(.Vj[d  +  a])(etwo)||i 


where  an  and  ai  are  two  real  constants.  Recall  that 


— n  t 


Ns(z)  = 


[rjo  mz  •••  1m- lZm  *] 

A(Z) 


Using  simple  calculus,  one  can  show  that 

dWRejNsW  +  a})^)  ||i 
dp k  a  dpk 


Writing  out  ||  jffe  ( .V<s [/3  +  a])  (eiu'°  )||i  explicitly  and  using  the  fact  that  pk  >  0 
gives 


/-||  Re(Ns[0  +  a])(e 
opk 


IUIQ  > 


1  = 


2-1  Ke\Tlke  A(e*wo) 


l 


d 


/  +  «(«*»)/ 

1  A(e‘"°) 


and  the  result  follows.  □ 


6.1  Worst  Case  Complexity 

In  this  section  we  consider  some  issues  related  to  complexity  of  the  iterative 
scheme  and  derive  some  worst-case  bounds  on  the  number  of  experiments 
and  ROS  designs.  The  types  of  complexity  results  we  are  after  address  the 
worst  case  behavior  of  the  scheme  with  respect  to  the  number  of  computa¬ 
tions  and  required  time  for  the  scheme  to  terminate.  The  behavior  really 
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depends  on  the  desired  performance  levei.  ~:m.  and  the  initial  model  set.  We 
begin  with  the  following  notation. 

Assume  that  the  initial  set  0O  C  Rm  is  a  hvpercube  with  side  lengths 
Sk-  Let  the  desired  smallest  partition  size  be  rdei.  Next  define  the  number 
of  resolution  levels 

r,  sh 

q  =  log2  — 

I 

and  so  the  actual  smallest  partition  size  is  given  by  rp  =  Sk‘2~q.  At  each 
level  k  6  [0, 9]  we  have  a  partition  Et  of  0o  consisting  of  2km  subsets. 

We  now  define  #D  as  the  total  number  of  ROS  designs  and  #E  as  the 
total  number  of  experiments  performed.  As  before,  we  assume  that  the 
existence  of  a  controller  means  that  such  a  controller  can  result  from  a  ROS 
design.  We  can  now  state  the  main  results. 

Theorem  6.2  The  total  number  of  experiments  is  bounded  as 

#£  <  2mq 


with  equality  only  if  7*  cannot  be  achieved  for  any  partition  E k,  k  <  (q  —  1), 
but  can  be  achieved  for  every  subset  in  E?. 

Proof.  The  equality  is  easy  to  see.  If  7’  can  be  achieved  for  some  subset 
Uj  E,,  then  we  perform  an  experiment  for  Uj  and  so  #E  is  incremented 
by  1.  However,  at  the  end.  we  either  quit,  or  invalidate  Uj,  but  in  either 
case  we  will  not  perform  experiments  on  at  least  two  more  sets  from  E,  that 
are  contained  in  If.  Thus,  we  see  that  at  any  level  k  <  q.  we  subtract  from 
the  worst  case  rfE  if  we  perform  an  experiment  and  therefore  the  worst  case 
#£  occurs  if  we  perform  experiments  on  all.  and  only,  the  subsets  of  E?.  □ 


Theorem  6.3  Given  the  definitions  above. 

#D  <  2mq+1  -  1 

Moreover,  there  exists  a  (difficult  enough)  performance  objective.  7*  >  0  s.t. 
equality  holds. 

Proof.  Clearly,  the  worst-case  occurs  when  the  performance  is  so  difficult 
that  it  cannot  be  achieved  for  any  subset  in  E?.  This  means  that  eventually, 
for  every  subset  of  every  partition,  a  ROS  design  will  have  to  be  performed. 
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But  these  are  not  the  only  sets  that  will  have  to  be  operated  on.  Going  from 
partition  E;.  to  Er+i  requires  m'2km  splits  (m  for  each  set),  and  each  split 
causes  the  number  of  designs  to  increase  by  two.  Since  we  are  considering 
the  case  where  the  entire  set  will  be  invalidated,  we  can  reorder  the  whole 
procedure  and  assume  that  the  splits  are  done  in  such  a  way  that  the  re¬ 
sulting  sets  are  in  the  a- Algebra  generated  by  E k+\-  This  means  that  the 
total  number  of  designs  is  given  by 

qm 

<  £  2*  =  2mq+1  -  1 

k= 0 

□ 

This  number  is  larger  than  if  we  had  simply  considered  all  the  sets  in  each 
Efc.  This  is  given  by  (2m(*+1*  -  l)/(2m  -  1). 

7  Illustrative  Example 

In  this  section  we  present  an  example  which  illustrates  the  iterative  scheme 
described  in  the  previous  sections.  It  is  assumed  that  the  plant  is  known  to 
consist  of  a  second  order  lightly  damped  mode  with  two  flexible  modes  at 
higher  frequencies.  The  lightly  damped  mode  is  known  to  be  of  the  following 
form. 

~  s2  +  2£uns  +  u2 

where  u;  <  £  <  Z  <  and  it  is  known  that  u  =  0.7.  w  = 

0.8.  £  =  0.2.  and  |  =  0.3.  It  is  known  that  the  two  other  modes  occur  at 
frequencies  of  approximately  8  and  12  rad/s.  We  can  discard  these  modes 
and  represent  them  with  unmodeled  dynamics  of  the  form  WZk.  The  next 
two  figures  show  the  full  and  simplified  plant.  After  converting  everything 
to  discrete  time  (Tsam  =  0.15s)  we  model  the  simplified  plant  by  the  fixed 
pole  approximation.  The  simplified  plant  is  given  by  the  following. 

0.0121z  + 0.0119  ar  6  [-1.9456,-1.9122] 
na:z>-  z2  +  aiZ  +  ao  a0  G  [0.9274,0.9570] 

This  means  that  the  fixed  pole  model  is  of  the  form 

ne  =  PTn1 9kzk  +  (6-  +  DW) A 
’  (z2  -  1.9289z  +  0.9422)L”l/2J 
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where  e”  is  obtained  from  Section  4.1  and  D(z)  is  the  denominator  polyno¬ 
mial  (a2  -  1.9289a  +  0.9422) Lm/2J.  The  e*  represents  the  error  between  the 
FPM  and  simplified  plant,  while  the  DW  term  is  the  error  between  the  full 
order  and  simplified  plant. 

We  now  assume  that  the  actual  plant  is  given  by 


0.00181a5  -  0.0026a4  4-  0.0055a3  -  0.0087a2  +  0.0005a  +  0.0053 
~  rb  _  2.4015a5  4-  2.5568a4  -  1.8369a3  +  0.9620a2  -  0.4987a  +  0.2362 

For  visual  presentation  purposes  we  consider  a  fixed  pole  model  of  order 
m  =  2.  The  corresponding  e*  is  0.0084.  The  ideal  performance  objective  is 
assumed  to  be  good  tracking  of  certain  duration  step  inputs,  however,  for 
the  robust  control  design  we  will  use  small  I2  gain  of  the  weighted  sensitivity 
transfer  function  as  the  design  objective.  We  consider  a  weighting  function 
which  will  allow  designs  of  fairly  demanding  bandwidth  (i.e.,  beyond  the 
first  lightly  damped  mode).  This  weighting  function  is  given  by 


Wp  = 


2  -  .2152 
2  -  .9984  ' 


7.1  Iterative  Scheme  Simulation  Results 

We  now  demonstrate  a  few  examples  of  the  iterative  scheme.  We  take  the 
two  dimensional  parametric  uncertainty  set  to  be  the  smallest  hypercube 
bounding  the  set  which  is  asymptotically  given  by  the  robust  set  membership 
identification  algorithm  given  in  [18] .  This  also  serves  to  show  the  potential 
improvement  in  performance  which  can  be  achieved  by  the  iterative  scheme. 
To  eet  an  idea  of  how  the  scheme  might  proceed,  we  compute  a  number  of 
ROS  controllers  for  various  grid  refinements  and  test  all  of  these  controllers 
on  the  plant.  This  only  gives  a  rough  idea  of  what  the  scheme  might  do 
because  we  only  consider  squares,  not  rectangles,  which  will  arise  due  to 
the  sets  being  split  in  one  dimension  at  a  time.  This,  however  gives  a 
global  picture  which  shows  how  the  predicted  robust  performance  base  on 
the  models  compares  with  the  actual  performance  achieved  with  the  plant. 
These  grids  are  shown  in  the  following  figures.  The  numbers  inside  the 
model  boxes  correspond  to  the  minimum  7  achieved  for  those  boxes,  while 
the  numbers  inside  the  corresponding  plant  boxes  show  the  actual  7  (i.e., 
HWSIU)  that  those  controllers  achieve  for  the  actual  plant.  Note  that  if  one 
asks  for  performance  level  7  >  1.66,  the  predicted  and  actual  performance 
are  fairly  well  correlated  and  the  optimistic  search  is  extremely  efficient. 
However,  one  can  see  that  as  desired  performance  improves,  we  must  go  to 
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finer  grids  where  the  performance  is  not  as  well  correlated  with  the  predicted 
robust  performance  and  some  invalidation  will  occur. 


Grid  1 
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294 

3.11 

Actual  Performance 


1.24 

1.16 
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1.31 
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1.95 

1.43 
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Actual  Performance 


1.16 

1.16 

1.24 
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1.20 

1.13 

1.09 

1.16 

1.27 

1.20 

1.11 

1.05 

210 

1.37 

1.26 

1.14 

Figure  6:  Grids  Showing  Predicted  and  Achieved  Performance 


We  now  show  the  evolution  of  the  iterative  scheme  for  various  desired 
performance  levels.  ~jdes .  This  is  shown  in  Figures  8  through  11.  The  smallest 
partition  size  is  chosen  such  that  the  space  is  split  at  most  three  times 
(Grid  3  of  Figure  7  is  the  finest  allowed  partition).  The  lightly  shaded  boxes 
are  the  ones  under  current  consideration  and  the  dark  shaded  boxes  are  the 
ones  that  have  been  invalidated.  The  values  7  correspond  to  the  achievable 
robust  performance  for  the  lightly  shaded  box,  while  7 p  corresponds  for 
the  performance  level  achieved  when  this  controller  is  applied  to  the  actual 
process.  The  first  execution  of  the  scheme  uses  jje3  =  2.5  and  the  results 
are  shown  in  Figure  8. 

The  evolution  agrees  with  the  data  from  Grid  1  in  Figure  6  and  one  can 


24 


1.525 

1.425 

1.33 

1.27 

1.21 

1.14 

1.63 

1.48 

1.38 

1.30 

1.22 

1.17 

1.79 

1.57 

1.42 

1.335 

1.25 

1.19 

2.07 

1.73 

1.50 

1.37 

1.30 

1.21 

2.41 

1.99 

1.66 

1.43 

1.33 

1.25 

3.01 

2.28 

1.88 

1.58 

1.395 

1.28 

4.18 

2.85 

2.20 

1.815 

1.53 

1.35 

6.42 

3.99 

2.71 

2.10 

1.71 

1.44 

1.17 

1.16 

1.19 

1.25 

1.28 

1.34 

1.38 

1.40 

1.14 

1.13 

1.14 

1.16 

1.22 

1.26 

1.30 

1.34 

1.13 

1.11 

1.08 

1.095 

1.12 

1.185 

1.22 

1.26 

1.16 

1.12 

1.10 

1.06 

1.05 

1.09 

1.16 

1.19 

1.23 

1.17 

1.12 

1.10 

1.055 

1.03 

1.07 

1.115 

1.32 

1.27 

1.18 

1.12 

1.09 

1.05 

1.05 

1.07 

2.16 

1.35 

1.28 

1.20 

1.16 

1.11 

1.06 

1.06 

210 

1.92 

1.43 

1.34 

1.26 

1.16 

1.11 

1.06 

Figure  7:  Grids  Showing  Predicted  and  Achieved  Performance 
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7=1.189  Yp=  1  -263 
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Figure  11:  Iterative  Scheme:  steps  9-12  (7 dea  =  1-25) 


Figure  12:  Actual  Sensitivity  Plot  for  Final  Controller 


28 


see  that  the  algorithm  marches  towards  the  box  which  predicts  7  =  1.66  and 
the  actual  performance  is  satisfied  (jp  =  1.16).  The  next  example  considers 
the  case  when  -,jes  =  1-25.  In  this  case,  shown  in  Figures  9  through  11. 
the  optimistic  search  leads  to  the  upper  right  corner  of  Grid  2  and  we  see 
that  the  actual  performance  will  miss  the  desired  value  of  1.25  so  the  upper 
right  corner  is  invalidated.  In  addition,  the  box  (two  boxes  down  from  upper 
right  box)  having  7  =  1.189  is  invalidated  because  the  actual  periormance 
misses  1.25.  Finally,  we  come  to  the  box  which  has  7  =  1.249.  and  the 
actual  performance  is  met  (7 p  =  1.22).  The  sensitivity  plot  of  the  actual 
performance  and  the  one  guaranteed  by  the  controller  for  the  final  box  is 
shown  in  Figure  12.  When  the  desired  performance  is  7  =  1-2.  the  entire 
model  set  is  invalidated.  To  achieve  better  performance  than  this  one  can 
use  the  4th  order  model.  Running  the  iterative  scheme  with  this  model  and 
a  desired  7  of  1.0.  the  scheme  terminates  after  11  iterations  with  a  predicted 

7  =  0.989  and  the  achieved  ~!p  =  0.987. 

8  Discussion 

The  framework  proposed  in  this  paper  very  general  in  the  sense  that  it  is 
valid  for  any  mutually  consistent  model  parameterization,  robust  control  de¬ 
sign  and  performance  objective.  The  model  set  transformation  problem  for 
the  FPM  and  a  special  prior  model  set  (i.e.,  stable  and  uncertain  poles)  was 
considered.  This  problem  may  also  be  formulated  using  the  gap  metric  [8] 
which  would  allow  unstable  prior  model  sets.  At  this  time  a  scheme  based 
on  ROS  may  be  the  least  conservative  because  of  the  lack  of  conservatism 
in  the  rank  one  synthesis  solution.  There  is.  however,  a  price  to  be  paid  for 
using  a  simple  model  such  as  the  FPM.  First,  the  ROS  with  performance  ob¬ 
jective  given  in  Section  5.1  is  limited  to  multi-input  single— output  (MISO) 
systems.  Second,  forcing  the  prior  information  to  be  mapped  into  a  FPM 
can  introduce  conservatism  in  the  form  of  large  unmodeled  dynamics,  which 
will  limit  achievable  peformance.  This  conservatism  can  be  reduced  if  one  is 
willing  to  partition  the  unmodeled  dynamics  as  well.  In  other  words,  one  can 
try  to  extend  the  scheme  based  on  ROS  and  FPM  by  considering  the  model 
{0T H  4-  eA}  where  e  is  not  fixed  but  can  also  be  invalidated.  One  must  be 
careful  in  this  case  since  the  sets  {eA  :  ei  €  [0,77)}  and  {eA  :  €  (77,61)} 

are  not  disjoint  even  though  the  values  of  e  are  disjoint.  Finally,  if  the  entire 
model  set  is  invalidated  one  can  either  change  the  performance  objective 
or  the  model.  The  examples  illustrate  that  increasing  the  complexity  of 
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the  model  allows  one  to  achieve  better  performance  through  the  iterative 
scheme. 

9  Conclusion 

This  paper  presented  a  new  framework  for  iterative  modeling  and  control. 
The  philosophy  of  this  framework  is  a  very  different  way  of  viewing  models 
and  their  role  in  designing  controllers  for  uncertain  systems.  The  model  is 
viewed  as  a  tool  used  to  describe  the  unknown  process  and  really  depends  on 
prior  information,  available  control  design  tools  and  other  modeling  prefer¬ 
ences.  The  approach  described  in  this  paper  was  an  iterative  procedure  for 
refining  the  uncertainty  set  via  robust  control  based  model  invalidation  and 
can  be  viewed  as  a  systematic  way  of  efficiently  searching  for  a  controller 
delivering  a  certain  desired  level  of  performance  to  the  unknown  process.  In 
this  way  it  is  possible  to  invalidate  the  model  if  it  does  not  facilitate  design 
of  a  controller  which  also  provides  good  performance  for  the  actual  process. 
The  result  of  an  iterative  scheme  in  this  framework  is  that  either  the  perfor¬ 
mance  goal  will  be  met  or  the  entire  uncertainty  set  will  be  invalidated  in 
accordance  with  our  modeling  and  control  method  prejudice.  An  iterative 
scheme  based  on  a  special  fixed  pole  model  structure  and  rank  one  mixed  g. 
synthesis  control  design  was  described  in  detail  and  a  specific  example  was 
used  to  illustrate  the  proposed  scheme. 

10  Acknowledgments 

This  work  was  supported  in  part  by  the  Airforce  Office  of  Scientific  Research 
under  Grant  AFOSR-91-03G8.  by  the  National  Science  Foundation  under 
Grant  NSF  915730G-ECS,  and  by  C.S.  Draper  Laboratory  under  Grant  DL- 
H-46712S  and  Draper  IR&  D  Project  No.  438. 


References 

[1]  B.  Anderson  and  R.  Kosut.  “Adaptive  Robust  Control:  On-Line  Learn¬ 
ing”.  pages  297-298,  Brighton.  England.  December  1991. 

[2]  S.P.  Boyd  and  C.H.  Barratt.  Linear  Controller  Design:  Limits  of  Per¬ 
formance.  Prentice- Hall.  1991. 


30 


[3]  S.P.  Boyd.  L.  El  Ghaoui.  E.  Feron.  and  V.  Balakrishnan.  Linear  Matrix 
Inequalities  in  System  and  Control  Theory.  SIAM.  1994. 

[4]  M.A.  Dahleh  and  J.  Doyle.  -‘From  Data  to  Control”.  In  Proc.  Workshop 
on  Modeling  of  Uncertainty  in  Control  Systems.  Springer-Verlag,  1992. 

[5]  M.A.  Dahleh  and  M.  Khammash.  "Controller  Design  for  Plants  with 
Structured  Uncertainty”.  Automatica.  21,  January  1993. 

[6]  J.  Doyle  and  G.  Stein.  "Beyond  Singular  Values  and  Loop  Shapes'. 
Journal  of  Guidance  and  Control.  14(  1  ):5 — 16,  January  1991. 

[7]  N.  Elia.  P.  Young,  and  M.A.  Dahleh.  ‘‘Robust  Performance  for  Fixed 
Inputs”.  In  Proc.  1994  Conference  on  Decision  and  Control.  Orlando. 
FL..  December  1994. 

[8]  T.  Georgiou  and  M.  Smith.  "Optimal  Robustness  in  the  Gap  Metric”. 
IEEE  Transactions  on  Automatic  Control ,  35:673-686,  June  1990. 

[9]  M.  gevers.  “Connecting  Identification  and  Robust  Control:  A  New 
Challenge”.  Technical  Report  Rep.  91.48,  Universite  Catholique  de 
Louvain.  Louvain,  Belgium,  1991. 

[10]  G.  Gu  and  P.  Khargonekar.  “Linear  and  Nonlinear  Algorithms  for 
Identification  in  H°°  With  Error  Bounds”.  IEEE  Transactions  on  Au¬ 
tomatic  Control ,  37(7):953— 964,  July  1992. 

[11]  A.J.  Helmicki.  Iv.  Jacobson,  and  C.  Nett.  “Control  Oriented  Sys¬ 
tem  Identification  in  .  IEEE  Transactions  on  Automatic  Control. 
36(101:1163-1176.  October  1991. 

[12]  M.  Khammash.  “Robust  Steady  State  Tracking”.  In  Proc.  1994  Amer¬ 
ican  Control  Conf.,  Baltimore,  MD.,  1994. 

[13]  P.  Khargonekar.  “System  Identification  in  Frequency  Domain:  Theory 
and  Examples”.  Proceedings  Conf.  Feedback  Control,  Nonlinear  Sys¬ 
tems,  and  Complexity ,  May  1994. 

[14]  R.  Kosut.  M.  Lau,  and  S.  Boyd.  “Set-Membership  Identification  of  Sys¬ 
tems  with  Parametric  and  Nonparametric  Uncertainty”.  IEEE  Trans, 
on  Auto.  Control ,  37(7):929-941,  July  1992. 


31 


[151  J.  Krause.  P.P.  Khargonekar.  and  G.  Stein.  "Robust  Adaptive  Con¬ 
trol:  Stability  and  Asymptotic  Performance'.  IEEE  Transactions  on 
Automatic  Control.  37f 3 ) :3 1 6—33 1 .  March  1992. 

[16]  J.  Krause.  G.  Stein,  and  P.P.  Khargonekar.  "Robust  Performance  of 
Adaptive  Controllers  with  General  Uncertainty  Structure  .  In  Proc. 
1990  Conference  on  Decision  and  Control.  Honolulu.  Hawaii.  December 
1990. 

[17]  W.  Lee.  B.  Anderson.  R.  Kosut.  and  I.  Mareels.  "On  Adaptive  Robust 
Control  and  Control- Relevent  System  Identification’.  In  Proc.  1992 
American  Control  Conference,  pages  2834-2841.  Chicago,  IL.  June 
1992. 

[18]  M.  Livstone.  Identification.  Robust  Adaptation  and  Iterative  Schemes. 
PhD  thesis.  MIT.  Cambridge,  Massachusetts.  October  1994. 

[19]  J.M.  Maciejowski.  Multivariable  Feedback  Design.  Addison-Wesley. 
1989. 

[20]  P.M.  Makila  .  "Robust  Identification  and  Galois  Sequences'  .  Technical 
Report  Rep.  91-1.  Abo  Akademi  (Swedish  University  of  Abo),  Abo, 
Finland.  January  1991. 

[21]  A.  nesterov  and  Y.  Nemirovski.  Interior  Point  Methods  in  Convex 
Optimization.  SIAM,  1994. 

[22]  Iv.  Poolla.  P.  Khargonekar.  A.  Tikku.  J.  Krause,  and  K.  Nagpal.  “A 
Time-Domain  Approach  to  Model  Validation”.  IEEE  Transactions  on 
Automatic  Control.  39(5 ) :35 1—359.  May  1994. 

[23]  A.  Rantzer  and  A.  Megretski.  “A  Convex  Parameterization  of  Robustly 
Stabilizing  Controllers''.  Technical  report,  The  Royal  Institute  of  Tech¬ 
nology,  Stockholm.  Sweden,  1993. 

[24]  R.  Schrama  and  P.  Van  den  Hof.  "An  Iterative  Scheme  for  Identi¬ 
fication  and  Control  Design  Based  on  Coprime  Factorizations”.  In 
Proc.  1992  American  Control  Conference,  pages  2842-2846,  Chicago, 
IL.,  June  1992. 

[25]  R.P.J.  Schrama.  “Accurate  Models  for  Control  Design:  The  Necessity 
of  an  Iterative  Scheme”.  IEEE  Transactions  on  Automatic  Control. 
22(2):  173-179.  July  1992. 


32 


[26]  R.S.  Smith  and  J.C.  DovLe.  “Model  Invalidation-A  Connection  Between 
Robust  Control  and  Identification”.  Proc.  1989  American  Control  Con¬ 
ference.  pages  1435-1440.  June  1989. 

[271  R.S.  Smith  and  J.C.  Doyle.  -Model  Validation:  A  Connection  Between 
Robust  Control  and  Identification".  IEEE  Transactions  on  Automatic 
Control.  37f 7):942-952.  July  1992. 

[28]  D.  Tse.  M.A.  Dahleh.  and  J.  Tsisiklis.  “Optimal  Asymptotic  Identi¬ 
fication  Under  Bounded  Disturbances”.  Proc.  1992  American  Control 
Conference.  Chicago ,  IL.  pages  679-685.  July  1992. 

[29]  P.M.  Young.  "Robustness  with  Parametric  and  Dynamic  Uncertainty”. 
PhD  thesis.  California  Institute  of  Technology.  Pasadena.CA.  May 
1993. 

[30]  Z.  Zang.  R.  Bitmead.  and  M.  Gevers.  “tf2  Iterative  Model  Refinement 
and  Control  Robustness  Enhancement”.  In  Proc.  1991  Conference  on 
Decision  and  Control ,  pages  279-284,  Brighton,  England.  December 
1991. 

[31]  Z.  Zang,  R.  Bitmead,  and  M.  Gevers.  “Disturbance  Rejection:  On-Line 
Refinement  of  Controllers  by  Closed  Loop  Modelling”.  Technical  Report 
Rep.  92.15,  Universite  Catholique  de  Louvain,  Louvain,  Belgium,  1992, 

[32]  K.  Zhou.  J.C.  Doyle,  and  K.  Glover.  Control  Theory.  Preprint, 
*  1994. 


33 


A  Framework  for  Robust  Parametric  Set 
Membership  Identification 


Mitchell  M.  Livstone*  and  Munther  A.  DahlelA 


Abstract 

This  paper  proposes  a  new  framework  for  studying  robust  paramet¬ 
ric  set  membership  identification.  We  derive  some  new  results  on  the 
fundamental  limitations  of  algorithms  in  this  framework,  given  a  par¬ 
ticular  model  structure.  The  new  idea  is  to  quantify  uncertainty  only 
with  respect  to  the  (finite  dimensional)  parametric  part  of  the  model 
and  not  the  (fixed  size)  unmodeled  dynamics.  Thus,  the  measure  of 
uncertainty  is  different  from  the  measures  used  in  previous  robust  iden¬ 
tification  work  where  system  norms  are  used  to  quantify  uncertainty. 
As  an  example,  the  results  are  used  to  assess  the  fidelity  of  a  certain  ap¬ 
proximate  robust  parametric  set  membership  identification  algorithm. 


1  Introduction 

In  the  past  half  decade  there  has  been  much  research  activity  in  the  area 
of  robust  system  identification,  otherwise  known  as  control-oriented  and 
control-relevant  system  identification.  The  motivation  can  be  attributed 
to  new  advances  in  robust  control  theory  which  did  not  interface  well  with 
the  existing  theory  of  classical  system  identification.  In  particular,  robust 
control  requires  the  plant  to  be  described  by  a  nominal  model  perturbed 
by  some  bounded  uncertainty  which  may  or  may  not  have  structure.  This 
uncertain  set  of  systems  is  assumed  to  contain  the  true  plant  and  the  robust 
control  theory  provides  methods  for  synthesizing  controllers  which  achieve 
certain  performance,  robustly,  for  the  entire  uncertainty  set.  This  model  set 
requirement  is  not  satisfied  by  the  classical  identification  algorithms  which 
typically  fix  a  parametric  model  structure  and  then  perform  some  kind  of 
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regression  to  get  a  value  for  the  parameters.  This  yields  a  single,  finite 
dimensional,  identified  system.  Thus,  the  main  focus  of  current  research 
in  robust  identification  has  been  the  formulation  of  algorithms  that  yield 
nominal  plants  along  with  measures  of  uncertainty  which  are  well  suited 
for  existing  robust  control  methodologies;  hence,  the  terms  control-oriented 
and  control-relevant.  Since  these  algorithms  yield  uncertain  sets  of  plants, 
all  of  these  robust  identification  algorithms  can  be  classified  as  some  kind  of 
set  membership  identification  (SMID)  algorithms. 

The  formulations  and  algorithms  essentially  differ  in  the  types  of  a  priori 
information  assumed  about  the  model  set  and  disturbances.  The  model  set 
assumption  is  partially  driven  by  the  description  of  modeling  uncertainty 
required  by  the  robust  control  design.  The  frequency  domain  algorithms 
in  [7,  6]  provide  nominal  models  along  with  unstructured  uncertainty  which 
is  bounded  in  H:,-  and  thus  provides  the  correct  description  for  H x,  control 
theory  [5.  3].  The  time  domain  algorithms  in  [11.  13]  provide  nominal  models 
with  uncertainty  bounds  in  the  lx  norm  and  are  well  suited  for  the  lx  control 
theory  [2].  Of  course,  the  lx  norm  provides  a  (potentially  conservative) 
bound  for  the  H.^  norm  so  the  time  domain  algorithms  can  also  be  used  in 
H0 o  robust  control,  however  conservative  the  bounds  may  be. 

The  algorithms  mentioned  above  are  formulated  in  a  worst-case  asymp¬ 
totic  setting.  The  formulation  is  worst-case  with  respect  to  the  plant  and 
noise.  In  other  words,  given  the  worst  allowable  noise  and  the  worst  plant  in 
the  original  model  set.  the  identified  set  must  contain  the  true  plant.  Fur¬ 
thermore.  the  algorithm  should  be  asymptotically  convergent  in  the  sense 
that  in  the  limit  as  the  noise—  0,  cardinality  of  data  and  the  nominal  plant 
order  both  —  x.  the  worst-case  identification  error  (i.e..  distance  between 
nominal  and  true  plants)  goes  to  zero. 

More  recently,  some  robust  extensions  of  the  parametric  set  membership 
identification  setup  have  appeared  in  the  literature  [15,  9,  8].  The  roots  of 
parametric  set  membership  identification  (PSMID)  can  be  traced  back  to 
the  late  1960’s  in  the  work  of  Schweppe  [12]  and  Bertsekas  [1]  who  studied 
state  estimation  under  unknown  but  bounded  disturbances.  These  ideas 
were  later  applied  to  system  identification  (parameter  estimation)  by  Fo- 
gel  [4]  and  a  steady  flow  of  papers  on  SMID  has  persisted  ever  since.  The 
models  used  in  these  papers  were  simple  ARMA  models  with  some  output 
additive  (unknown  but  bounded)  noise.  Since  these  models  are  of  fixed  fi¬ 
nite  dimension,  they  are  not  very  useful  for  robust  control.  This  motivated 
Younce.  Krause  and  others  [15,  9.  8]  to  consider  a  model  with  unstructured 
uncertainty.  Algorithms  which  use  this  model  set  will  be  referred  to  as  ro- 
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bust  parametric  SMID  (RPSMID)  and  an  example  of  a  prior  model  set  is 
the  following  set  with  additive  unstructured  uncertainty. 

Mo  =  { Ge  +  M'A  :  9  G  0O  C  Pm.  ||A||oo  <1}  (1) 

where  G$(z)  is  a  SISO,  rational  transfer  function  whose  polynomial  coeffi¬ 
cients  are  elements  of  the  parameter  vector  6,  and  W  is  a  known  (assumed) 
weighting  function  for  the  uncertainty. 

Surprisingly,  essentially  all  of  the  work  in  PS  MID  over  the  last  15  years 
has  focused  on  the  construction  of  algorithms  and  computations,  with  very 
little  mention  of  convergence  issues.  A  formal  framework  which  can  address 
issues  such  as  fundamental  limitations,  uncertainty  and  optimal  inputs  seems 
to  be  missing.  In  the  special  case  of  FIR  models,  the  work  of  Tse  [13]  can 
be  applied,  although  it  yields  conservative  results. 

In  this  paper  we  introduce  a  new.  robust  parametric  set  membership 
identification  (RPSMID)  framework  and  derive  some  results  on  the  funda¬ 
mental  limitations  of  algorithms  in  this  framework.  The  development  is 
similar  to  the  work  of  Tse,  et.al.  [13].  and  the  results  can  be  viewed  as  gen¬ 
eralizations  of  some  of  the  results  therein.  This  paper  is  organized  as  follows. 
The  RPSMID  problem  is  formulated  in  the  following  section.  The  reset  of 
the  paper  is  concerned  with  the  diameter  of  the  uncertainty  set  and  optimal 
inputs  which  can  shrink  the  uncertainty  set  to  its  theoretical  minimum.  In 
Sections  3-5  we  present  some  results  on  the  size  of  the  worst-case  uncer¬ 
tainty  sets  and  optimal  inputs  for  two  special  cases:  noise-free  (S  =  0)  and 
purely  parametric  (A  =  0).  Section  6  contains  the  corresponding  results  for 
the  general  case.  Section  7  illustrates  how  these  results  can  be  used  to  assess 
the  fidelity  of  an  approximate  RPSMID  algorithm. 

2  Problem  Formulation 

Let  the  linear  time  invariant  plant  model  set  be  given  by: 

Mo  =  {G(0,  A) :  ee  00  c  Em,  II  All  <  1}  (2) 

where  A  is  defined  on  some  Banach  space  (.ffoo  or  ^l)-  Given  an  input  u  6  lp 
and  ||u||p  <  1,  the  experiment  is  defined  by 

y  =  hp  *  u  +  d  hp  e  A,  ||d||  <  <5  (3) 

We  assume  that  the  parameters  to  be  identified  are  given  by  6 ,  wffiile  the 
inherent  unstructured  uncertainty  in  the  model  is  captured  by  A.  The  exact 
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model  structure  is  embedded  in  the  functional  form  of  G.  For  example,  an 
additive  uncertainty  structure  can  be  expressed  by 

Bo 

G(ff.A) =  — -  +  l FA  . 

--if? 

A  set  identification  algorithm,  o.  maps  the  experiment  data  up  to  time  n 
and  the  priors  to  an  identified  set  Mn: 

0(Pn(u.  y),A.6.Q0)  ~  Mn 

The  plant  membership  set.  Sn  is  given  by  all  plants  consistent  with  the 
observed  data  (up  to  time  n)  and  the  priors. 

5,  =  {G(0.A):  Pn(y-G(0.A)*u-d)  =  0 

for  some  0  6  ©o.  J|A]|  <  1.  \\d\\  <  Is}  (4) 

The  parametric  membership  set.  0£,  is  the  set  of  all  parameter  values  con¬ 
sistent  with  the  data  and  priors,  and  is  given  by 

0‘  =  {<?€£ m:G’(0.  A)  €Sn}  (5) 

Note  that  Sn  C  {G(6,  A)  :  6  €  0£,||A||  <  1}  =  Sn,  which  is  actually  the 
uncertainty  set  of  plants  generated  by  0jj  and  all  possible  A  in  the  unit  ball. 
An  algorithm  is  said  to  be  RPSMID  when  it  satisfies 

<Sn  C  Mn  ■ 

Tlris  simply  means  that  the  identified  set  contains  the  consistent  set.  The 
fidelity  of  the  algorithm  can  then  be  judged  based  on  how  tight  an  overbound 
it  provides  for  the  consistent  set. 

In  the  RPSMID  framework,  the  identification  error  for  any  RPSMID 
algorithm  is  bounded  from  below  by  the  diameter  of  Sn.  However,  since  the 
size  of  A  is  fixed,  it  only  makes  sense  to  study  the  convergence  of  the  size 
of  the  parametric  set  0jj. 

We  now  define  uncertainty  in  this  framework  and  examine  its  asymptotic 
worst-case  behavior.  Uncertainty  will  be  measured  only  with  respect  to  the 
parametric  part  of  the  model.  Thus,  the  uncertainty  associated  with  the  set 
Sn  will  be  the  same  as  the  uncertainty  in  Sn.  This  uncertainty  is  defined 
as  the  diameter  of  0jj  with  respect  to  the  metric  p  on  defined  in  the 
following  way. 

diam(0)=  sup  sup  p{6i,02)  (6) 

&l  £  ©  #2  £  © 
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It  will  be  understood  that  (with  slight  abuse  of  notation)  diarn^,*,)  means 
diam(0Jo)  as  defined  above.  Using  this  definition  we  define  the  worst-case 
diameter  of  uncertainty  in  the  standard  way. 

D(u.Mo.A.6)=  sup  sup  diam(«Sco(A<o«  u-  u  *  h  +  d,  A.  6)) 

h€M o  ||<i||<5 

At  this  level  of  generality  it  is  difficult  to  say  anything  about  the  diam¬ 
eter  of  uncertainty.  We  will  impose  more  structure  and  try  to  derive  more 
specialized  results.  In  particular,  we  will  consider  the  additive  uncertainty 
structure 

0(0.  A)  =  0(0)  +  IFA 

where  0(0)  is  affine  in  0  and  0o  =  0c  +  0o,  where  0O  is  a  balanced  and 
convex  set  in  7W.  Furthermore,  we  require  that  IT'  is  stable  and  has  a  stable 
inverse  in  the  space  where  it  will  be  defined.  These  assumptions  will  hold 
for  the  remainder  of  this  chapter. 


3  Purely  Parametric  Case 

We  first  consider  the  case  where  A  =  0  so  the  uncertainty  enters  only 
through  the  disturbance,  d.  We  derive  lower  bounds  for  the  diameter  of 
uncertainty  and  a  few  special  cases.  Later  we  consider  the  existence  of  in¬ 
puts  which  can  asymptotically  shrink  the  uncertainty  to  these  lower  bounds. 
The  main  result  is  given  by  the  following  theorem  which  provides  a  lower 
bound  for  the  diameter  of  the  uncertainty  set. 


Theorem  3.1  IflO\,02  €  0o  (closed  and  bounded)  s.t.  ||G(0i)  — (7(02)||i  > 
26.  then  for  any  input  u  6  lp. 

D(  u.M0. 0, 6)  >  2  sup  {||0  -  0e\\  :  \\G(0)  -  G'(0c)||i  < 

where  6C  is  the  analytic  center  of  0q 


inf  SUP  ||0-0i II  =  sup  ||0 
h  €®m0£0o  06©o 


Remark  3.2  The  norms  in  the  theorem  are  not  specified  because  any  norm 
defined  on  can  be  used  to  measure  0  and  the  diameter  is  defined  with 
respect  to  the  metric  induced  by  this  norm.  The  norm  used  to  measure  the 
operators  is  the  same  as  the  norm  used  for  A. 
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Before  proving  the  theorem  we  need  a  lemma  and  a  fact.  We  can  assume, 
without  loss  of  generality,  that  G{9)  is  linear  (not  affine)  in  6. 

Lemma  3.3 

sup  sup  diam  ( u.  u*  h  +  d.  6))  =  diam  ( (.Cf0.  u.  u  *  G(9C),6)) 
fce.M0||d||<5 

Proof.  The  above  means  that  given  any  9p  £  0o<  ||d||  <  S  and  e  >  0. 
if  39i,92  £  Qcx(Mo,  u.  u  *  G{9P)  +  d.S)  and  p{  9\ .  02 )  —  c,  then  39[,9'2  £ 
O^Mo,  u-  u  *  G(9C),6)  and  p{9\.  9'2)  >  e.  We  will  now  show  that  one  can 
always  choose  9\  =  9C  +  -1  ~/2  and  9'2  =  9C  —  2  - .  We  first  show  that  9[,92  £ 

0o  by  showing  that  g]  ~h2  £  0O.  Because  0o  is  a  balanced  set.  and  —92  are 
both  in  it.  Combining  it  with  the  fact  that  0o  is  also  convex  gives  the  result 
9\.9'2  £  0o-  We  now  show  that  9[  and  0'2  are  also  in  ©^(.Cfo.  u.  u  *G(9C).  S ). 
Define  9  =  9\  -  92.  Since  ffi . 92  £  u.  u  *  h  +  d.S)  implies 

h  *  u  -(■  d  —  G( 9\ )  *  u  +  d\  —  G(92)  *  u  T  d2 

for  some  ||di||  <  S  and  ||c?2||  <  S.  this  means  that 

G(9 )  *  u  —  d2  —  d\  . 

Now,  if  the  plant  is  given  by  h  —  G(9C)  and  d  =  0  we  have 

y  =  G(9C)  *  u  =  G(9C  +  ^)  +  *u  +  d\  =  G(9C  --)  +  *«  +  d2 

which  implies  that  G'(|)  *  u  =  —d[  and  G'(§)  *  «  =  d'2.  This  shows  that  one 
can  take  d[  =  ^(di  —  d2)  and  d'2  =  ~d\.  Since  ||r/',||  =  ||<f2||  <  S.  this  shows 

that  9[,  9'2  £  0^(.Vfo,  u,  u*G{9c),S).  Furthermore,  p{9[,92)  =  p{- 1. 1 )  =  e. 

□ 


Fact  3.4  If  0  is  a  balanced  and  convex  set,  then 

sup  ||ffi  -  6>2|j  =  2  sup  ||(9|| 

01,02€0  0€© 

Proof.  (Thm)  Lemma  3.3  above  shows  that  the  worst  case  uncertainty 
occurs  when  the  plant  is  G(9C)  and  d  —  0.  Then  given  any  ||ti||  <  1.  for 
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each  9  6  0^(.WO.  u.  u  *  G(9C).6)  we  have  j|(G(0c)  -  G{0))  *  u||  <  6  which  is 
implied  by  ||G(0C)  -  G'(0)||i  <  S.  This  means  that 

{9  :  ||G(0C)  -  G’(0)||i  <  6}  C  0^A4o,m«*G(0cM) 

for  anv  ||u||  <  1.  Tliis  implies  ihni  iur  any  |ju-|j  <  i 

D(u.  Mo.  0. 6)  =  diam  (0^  (.Wo.  u.u*  G(6C ),  6))  >  diam  (^0cj 

and  since  {9  :  ||G(0C)  -  G(0)||i  <  <$}  is  a  ( 9C )  translation  of  a  balanced  and 
convex  set.  diam(0c)  =  2  •  sup  j]|#  -  0J|  :  9  6  0C|  (bv  Fact  3.4).  □ 


Corollary  3.5 

then 


If  G{9)  = 


Y.{ek)-M 

A(z) 


and  diam  is  measured  w.r.t.  the  l -norm. 


D{u,Mq.0.6)  >  2 6//3 


where  /3  =  ||l/A||i 


Note  that  this  bound  may  not  be  tight,  but  as  A  —  1  it  becomes  tight  and 
agrees  with  a  result  by  Tse,  et.al.  [14].  It  is  also  interesting  to  compare 
this  approximation  to  the  one  which  would  be  obtained  by  applying  Tse’s 
result  directly  to  this  case.  Tse’s  result  can  be  applied  here  when  G  is  FIR 
so  it  is  necessary  to  multiply  through  by  A{z).  After  defining  y  =  Ay,  the 
experiment  becomes 


y  =  g*u  +  A*d  ||d||oo  <  6  . 


Now  one  can  get  an  inner  approximation  to  the  uncertainty  set: 

{9  :  || g  *  u  -  iiWco  <  ll‘4||i<5}  c  {g  :  3\\d\\^  <  6  s.t.  \\g  *  u  -  y\\ <  ||Ad||oo} 


and  so  D(u,  Mo,  0,  6)  >  2||A||i$.  Comparing  this  result  to  the  corollary,  one 
can  see  that  this  result  is  at  least  as  conservative  as  the  approximation  in 
the  corollary.  This  is  due  to  the  fact  that  given  any  invertible  operator,  A, 
and  an  induced  norm  ||  •  ||,  the  inequality 


Mil  > 


i 


always  holds. 


4  Noise  Free  Case 


We  now  consider  the  case  where  S  =  0  so  the  inherent  uncertainty  enters 
only  through  A.  The  main  result  gives  a  lower  bound  for  the  diameter  of 
the  uncertainty  set. 

Theorem  4.1  If38\.02  G  Oo  s.t.  >  i  then  for  any  input 

u  G  lp 

D(u.M  o.  A.O)  >  2  sup  |||<9  -  9C\ ]  :  j|  G(°]  ~~-{6c)  j  <  lj  (7) 
where  9C  is  the  analytic  center  of  O0 . 

Before  proving  the  theorem  we  need  one  lemma.  We  again  assume, 
without  loss  of  generality,  that  G( 0 )  is  linear  (not  affine)  in  9. 

Lemma  4.2 

sup  diam(Qcyo(Mo-:  h.  A))  =  diam  (Q^lMo,  u,  u  *  G'(0C),  A)) 

heM0 

Proof.  The  proof  is  similar  to  the  proof  of  Lemma  3.3  and  is  therefore 
omitted.  □ 


Proof.  (Thm)  Lemma  4.2  above  shows  that  the  worst  case  plant  is  given 
by  A  =  0  and  0act  =  9C.  Once  this  is  shown,  assume  that  the  plant  is  G(6C). 
Then  given  any  input,  if  G{6C)  -  G(6X )  +  U'A  for  some  ||A||  <  1.  it  follows 
that  9 1  6  0^.  This  shows  that 


(a  G(9)-G(6C) 

r:  — w — 


c  0 


c 

CO 


but  since  this  set  is  convex  and  balanced  (with  translation),  Fact  3.4  shows 
that 


diarn 


G(9)  -  G{9C) 
W 


This  completes  the  proof.  □ 

We  now  consider  a  special  case. 
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y- q  z-k 

Corollary  4.3  If  G  =  —  .  the  diameter  and  A  are  measured  w.r.t. 

the  l\  norm  on  ?_m  and  l\.  respectively. 


D(u.Mo.^.O)  >  ~5 


where  /3  =  ||l/.4lT||i . 

Proof.  We  simply  note  that 


y^/a.  >  k  I!  ii  i  ii 

il  <  ii  A  ii 


This  means  that 
$  : 


AW 


m- 1  ii 

nX>n*-‘ii.  =  l 

1  k- 0 


AW 


11*11 


1 


AW 


x  II^IU  <  i} 


C  <0: 


E (ok)z 


A(z)W(z)  j  x 


i  <  1 


and  the  result  follows  after  applying  Theorem  4.1.  □ 


5  Optimal  Inputs 

In  this  section  we  show  that  there  exist  inputs  which  can  decrease  the  di¬ 
ameter  of  the  uncertainty  set  to  the  theoretical  lower  bounds  derived  in  the 
previous  sections.  We  use  Galois  sequences  and  arguments  similar  to  those 
of  Tse,  et.al  [14]  and  Makila  [11].  A  Galois  sequence  of  order  n  is  a  minimum 
length  binary  sequence  which  contains  every  possible  subsequence  of  length 
<  n.  We  consider  the  parametric  case  first. 

5.1  Optimal  Inputs  -  Parametric  Case 

In  the  purely  parametric  case  (W  -  0)  the  theoretical  lower  bound  is  given 
bv 

2  sup  {||#  —  0C||  :  ||C?(0)  -  G(0c)||i  <  <5} 

We  can  assume  WrL0G  that  6C  =  0  and  define 

a  =  2  sup  {||^j[  :  ||G(0)||i  <  <5} 

The  main  result  shows  that  this  lower  bound  is  in  fact  tight  when  a  Galois 
sequence  input,  u”  is  used. 
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Theorem  5.1  Let  all  definitions  of  Section  3  hold.  Then 

D( !/”.  Mo,  0. <*>)  <  a 

Proof,  The  theorem  will  be  proved  for  G  linear  (not  affine)  in  9  but  the 
extension  to  the  affine  case  is  trivial.  Choose  any  9  £  .  uM.  0.  6 ).  This 

implies  that  ||G(0)  *  iT ||,x  <  S.  and  since  G(6)  is  linear  in  9 

X 

G(9)  —  y.  yk(9)z~1''  ejk{9)  linear  in  9  'ik 
k= o 

Since  the  input  u~  is  a  Galois  sequence.  3m  >  0  s.t. 

[«rn  «m+t  •■•“m+A']  =  [sgn( gs( 9))  sgn(£;V-i ( 9)) . . . sgn(5o( 9 ) )] 

This  means  that 

Im+A’ 

(G(0)  *  «")m  +  A  |  =  \y 

I  k—m 

m+N 

=  £  l«-«l  =  IIC(*)lli 

k=zm 

Putting  this  together  gives 

6  >  ||G(0)  *  «■  IU  >  |«?(0)  *  inm+N I  =  !W)||i 

This  means  that  ||0||  <  a/2  and  thus  D(u~ .  Mo,  0, 6)  <  a  □ 

5.2  Optimal  Inputs  -  Noise  Free  Case 
Recall  that  in  this  case  the  experiment  is  given  by 

y  =  [G{9)  +  VTA)  *  u 

and  assume  without  loss  in  generality  that  the  set  0O  is  centered  at  zero 
( 9C  =  0).  Define  the  theoretical  lower  bound  in  Theorem  4.1  (with  9C  =  0) 
as 


The  main  result  shows  that  this  lower  bound  is  in  fact  tight  when  a  Galois 
sequence  input,  u ’  is  used. 
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Theorem  5.2  Let  all  definitions  of  Section  4  hold.  Then 


£>(tt‘,X0.A.O)  <  a 


Proof.  The  proof  considers  the  linear  case,  but  it  is  easy  to  see  that  the 
result  holds  for  affine  G  as  well.  Choose  any  9  E  n*, A, 0).  This 

implies  that 


G(9) 

W 


*  u 


<  II A  *  u  lloo 


Furthermore,  define  G(9)  =  2^  which  belongs  to  lx  and  is  also  linear  in  0. 
This  gives 

CO 

G{9)  =  Yhdk{9)z~k  gk(9)  linear  in  9  Vfc 

k=0 

and  for  any  e  >  0.  3N  >  0  s.t.  Lv+1  \gk\  <  e/2.  Since  the  input  u*  is  a 
Galois  sequence.  3m  >  0  s.t. 


Kn  «m+l  •  •  •  Um+N\  =  [sgn (&\'(0))  Sgn^jV-l (9)) . . . sgn(flfo(0))] 


This  means  that  for  every  e  >  0,  3m.  M  >  0  s.t. 
1(6(0)  *u") 


m+N 


m+N 

22  9k(9)um+N-k 

k=0 

N  N+m 

22  9k(9)um+x-k  +  9k(9)um+N- 

k—U  k=i\  •+  i 

N  N+m 

sgn (§k(9))9k{9)  +  22  9k(9)Um+.\-k 

k= 0  k=N+ 1 

N  N+m 

>  E  fe(*)l  -  E  ImWI 

k= 0  k=N+ 1 

>  116(0)11!  -C 


Putting  this  together  gives 


G(9) 

< 

( G w . 

< 

G(9 )  . 
— 7—  *  u 

W 

V  FF  )m+N 

W 

<  || A  *  u*||  <  II Adi ||tt* ||oo  <  ||  A||!  <  1 
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This  means  that  j|0||  <  a/2  and  thus  D{u’.Mo-Q-t)  <  a  □ 


6  General  Model 

We  are  now  ready  to  consider  the  general  case  with  uncertainty  as  well 
as  a  disturbance.  This  case  is  more  difficult  and  we  cannot  get  an  exact 
expression  which  is  not  a  function  of  A  or  d.  One  can  again  show  that  the 
worst  case  situation  is  8  =  8C.  A  =  0.  and  d  =  0.  We  will  assume  WLOG 
that  9C  =  0.  This  gives  the  following  expression. 

D(u ,  Mo-  A .6)  >  2  sup  {||0|j  :  jjG(0)  +  H"A|j  <  f  for  some  ||A||  <  1} 

We  can  use  an  argument  entirely  similar  to  the  one  used  in  the  two  previous 
Sections  on  optimal  inputs,  and  show  that  using  a  Galois  input  we  can 
match  the  sign  of  any  N  consecutive  elements  of  the  impulse  response  of 
G(9)  +  WA.  This  will  show  that  for  this  optimal  input,  the  set 

0"(A .6)  =  {8  :  ||G(0)  -  WA||  <  8  for  some  ||A||  <  1} 

is  equivalent  to  the  consistent  parameter  set  0^.  We  can  similarly  define 
the  corresponding  sets  for  the  two  special  cases 

0*(A,O)  =  j#  : 

and 

0*((M)  =  {9  :  ||6'(^)||i  <  ^ } 

We  can  now  show  the  following  result. 

Lemma  6.1 

0*(A.O)®0-(O,<5)C  0*(A.<5)  (8) 

where  ©  is  the  Minkowski  set  addition. 

Proof.  Choose  any  9\  £  0*(A,O)  and  82  G  ©*( 0.  <5 ).  This  means  that 
for  any  ||u||  <  1,  there  is  some  |j A||  <  1  and  some  ||cf||  <  6  such  that 
[G{9\]  +  W  A]  *  u  =  0  and  G(02 )  *  u  —  d  —  0.  But  this  implies  that  [G{9\  + 
87)  +  WA]  *  u  -  d  =  0  which  means  that  8\  +  92  G  0*(A,<5).  □ 
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Unfortunately,  the  diameter  of  this  set  sum  cannot  be  tightly  bounded  from 
below.  In  fact,  it  is  easy  to  derive  the  following  bounds. 

max(diam  i 0"(  A. 0)) . diam (0*(0, 6)))  <  diam  (0*(A, 0)  ©  O‘(0.<5)) 


and 


diam  (0'f  A.  0)  0  0"(O.£))  <  diam  (0*(A.  0))  +  diam  (0*(O.  /*)) 

We  can  also  show  that 

0*(A.<5)  C  {0  :  ||G(0)||i  <  £  +  llWHr}  (9) 

This  gives  the  following  bounds  for  diam(0*(A,<5)). 

Theorem  6.2 

max(f/mm(0*(A,O)),  diam (O* (0,6)))  <  diam(Q*(A.6)) 


and 

diam  (0* (A,  <5))  <  diam({0  :  ||<j(0)||i  <  <5  +  ||W||i}) 


These  bounds  are  not  tight  except  when  W  — «•  0.  In  fact,  when  6  < 
||W||i  the  upper  bound  can  be  very  poor  since  the  set  {0  :  ||Cr(0)||i  <  ||W||i} 
can  be  much  larger  than  1 6  :  |  <  l|. 


7  Assessing  Fidelity  of  Approximate  Algorithms 

In  this  section  we  show  an  application  of  the  results  developed  in  the  previous 
sections.  In  particular  we  use  these  results  to  study  the  conservatism  of  a 
particular  approximate  ellipsoid  RPSMID  algorithm  presented  in  [10].  The 
analysis  in  [10]  presents  worst-case  asymptotic  uncertainty  results  under 
optimal  inputs.  The  algorithm  itself  is  not  important  and  we  only  state  the 
relevant  result.  Given  the  model  structure 

G{9,  A)  =  G(0)+  WA 

where  G(0)  is  affine  in  0  and  the  experiment  is  given  by 

y  =  G(0,A)u  +  d  ,  Hdlloo  <  « 
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the  asymptotic  worst-case  diameter  of  the  approximate  parametric  mem¬ 
bership  set  (an  ellipse)  measured  in  the  oc  norm  on  ?:m  is  given  by 

diam( Ellipse)  <  2j|T||1(||ir||1  +  6) 

We  will  use  this  in  conjunction  with  the  exact  asymptotic  results  developed 
in  this  paper  as  follows.  The  worst-case  exact  and  approximate  results  can 
be  stated  roughly  as 

diam( exact ;  >  r  and  diam(approx)  <  w 

and  this  implies  that  diam( approx )<  f  diam(exact).  This  gives  an  upper 
bound  for  the  diameter  of  the  approximate  solution  as  a  function  of  the 
exact  diameter  of  uncertainty.  The  main  result  is  as  follows. 

Theorem  7.1  For  the  uiodtl  given  by 

y  -  u  ±\ylt  -j-  (I 

/I 

the  worst-case  diameter  of  uncertainty  under  Galois  inputs  is  given  by 
diam(ellipse)  <  (1  +  ||lE||1||W^1|!1)(||A||1||,r1|!1)dmm(0*(A.<5)) 

Proof.  From  [10]  we  know  that  the  diameter  (under  optimal  inputs)  mea¬ 
sured  in  the  oc-norm  satisfies 

diam( Ellipse)  <  2||/1||1( \\W\h  +  6)  (10) 

We  now  derive  an  explicit  lower  bound  for  the  diameter  of  the  exact 
uncertainty  set  using  Lemma  6.1.  Notice  that  the  proof  of  Lemma  6.1  is 
valid  whether  A  is  measured  in  li  or  H0 0.  We  make  the  distinction  between 
these  two  norms  more  explicit  by  defining  the  two  sets 


0M^O)={«: 

G{9) 

W 

-1} 

oo  ) 

and 

0 

» 

*-* 

o 

III 

G(0) 

W 

while  the  definition 

e-((U)  =  {9  :  ||G(9)||,  <  <} 
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holds  as  in  Lemma  6.1.  It  is  easy  to  see  that  ©j^f A, 0)  C  ©jL^fA.O). 
The  above  observation  and  definitions  can  easily  be  used  to  show  that  when 
©X(A,<5)  is  defined  with  \\&\\oz.  <  1  we  have 


ef11(io)®0,(o<f)c0yio)ee,(o,«)c0*(A,«)  (in 

We  also  have 


diam  (0^j(A,O))  > 


2 


> 


2 


> 


2 


and 


[AHO-MU  -  !!(ALF)-I||i  -  l!A-1!!i||VF-1||i 

28 


diam  (0X(O,  <5))  > 


IIA 


-H 


which  we  can  combine  to  get 


2 


diam (Q*(A. 8))  >  ir7rTTT  max  (  '6 


l|A  1  in 

We  now  combine  Equations  12  and  10  to  get 


(12) 


diam( Ellipse)  <  - ^^L||A||1||A-i||ldiam(0*(A^)) 

max(pv%^J 


The  result  follows  from  the  fact  that 


6  +  \\W\\i 


maxiw^%'S) 


+ 


mu 


max 


(llw'-Mli’O  max(||W'-1||i’<0 


<  l  +  l.||iE||1||W-1||1 


□ 


There  will  typically  not  be  a  problem  with  the  W  term  since  it  can 
usually  be  picked  as  a  fairly  well  conditioned  system.  The  A  terms  cannot 
be  controlled  as  easily.  If  the  poles  are  close  to  the  unit  circle,  the  quantity 
||A||i||A-1||i  may  be  large  and  the  approximate  solution  may  be  somewhat 
conservative  in  this  case. 
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8  Conclusion 


In  this  paper  we  have  presented  a  new  framework  for  studying  robust  para¬ 
metric  set  membership  identification.  Some  concrete  results  concerning  the 
diameter  of  the  worst-case  uncertainty  set  were  derived  for  an  affine  in  the 
parameters  model  structure.  It  was  also  shown  that  Galois  inputs  are  op¬ 
timal  for  asymptotically  shrinking  the  worst-case  diameter  of  uncertainty. 
These  results  were  then  applied  to  the  assessment  of  fidelity  of  a  certain  ap¬ 
proximate  robust  parametric  set  membership  identification  algorithm.  It  is 
not  known  whether  similar  results  can  be  developed  for  more  sophisticated 
model  structures  and  this  is  a  direction  for  future  research. 
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Set  Membership  Identification  Algorithms  and 
Asymptotic  Properties 

Mitchell  M.  Livstone'  and  Munther  A.  Dahleld 


Abstract 

This  paper  addresses  the  asymptotic  worst-case  properties  of  set 
membership  identification  (SMID)  algorithms.  We  first  present  a  set 
membership  identification  algorithm  which  can  be  used  with  a  model 
structure  consisting  of  parametric  and  nonparametric  uncertainty,  as 
well  as  output  additive  disturbances  that  are  deterministic  and  magni¬ 
tude  bounded.  This  algorithm  is  then  studied  in  the  context  of  asymp¬ 
totic  worst-case  behavior.  We  derive  a  lower  bound  on  the  worst-case 
achievable  identification  error,  which  is  measured  by  the  volume  of  the 
identified  ellipsoidal  uncertainty  sets.  We  then  show  that  there  exist 
inputs  which  can  shrink  the  uncertainty  sets  to  this  lower  bound. 


1  Introduction 

The  roots  of  set  membership  identification  can  be  traced  back  to  the  late 
1960’s  in  the  work  of  Schweppe  [15]  and  Bertsekas  [2]  who  studied  state 
estimation  under  unknown  but  bounded  disturbances.  These  ideas  were 
later  applied  to  identification  (parameter  estimation)  by  Fogel  [6]  and  a 
steady  flow  of  papers  on  SMID  has  persisted  ever  since  [7,  13.  14.  1.  16, 
9,  5],  Most  of  the  work  in  SMID  over  the  last  15  years  has  focused  on 
the  construction  of  algorithms  and  computations,  with  very  little  mention 
of  convergence  issues,  especially  optimal  input  design.  The  work  on  SMID 
algorithms  can  be  subdivided  mainly  into  two  categories:  ellipsoids  and 
polytopes.  This  work  aims  at  constructing  algorithms  which  tightly  bound 
the  parametric  uncertainty  set  with  ellipsoids  and  polytopes,  respectively. 
Given  the  disturbance  assumptions  ||<f||oo  <  ellipsoid  algorithms  are  more 
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conservative  than  their  polvtope  counterparts  since  the  poivtopes  give  the 
exact  characterization  of  uncertainty  in  this  case.  However,  what  is  lost  in 
conservatism  is  gained  in  computational  simplicity.  The  recursive  ellipsoid 
algorithms  require  oniv  matrix  multiplication,  while  the  poiytope  algorithms 
typically  require  solutions  to  linear  programs. 

There  are  essentially  two  types  of  model  structures  considered  in  the 
SMID  literature.  The  first  is  a  purely  parametric  (ARM A)  moael  with  ad¬ 
ditive.  unknown  but  bounded  disturbances  [6,  7,  13.  14.  1.  5].  This  includes 
a  large  portion  of  the  SMID  work  in  the  literature.  The  more  recent  ap¬ 
proach  uses  a  model  structure  which  is  parametric  along  with  nonparametric 
uncertainty  (either  additive  or  multiplicative),  but  no  disturbance  [16.  10]. 
One  exception  is  the  work  of  Kosut.  et.al.  [9]  which  discusses  nonparametric 
uncertainty  and  disturbances  which  are  either  stochastic  or  satisfy  a  spectral 
energy  bound.  The  presence  of  nonparametric  uncertainty  does  not  allow 
polytope-tvpe  algorithms  to  be  used  and  one  is  left  with  either  ellipsoid 
algorithms  for  approximate  characterizations  or  infinite  dimensional  convex 
programs  for  exact  characterizations  of  uncertainty.  In  this  paper  we  develop 
an  algorithm  for  parametric  and  nonparametric  uncertainty  with  additive, 
magnitude  bounded  noise  and  study  its  asymprotic  properties  in  detail. 

The  paper  is  organized  as  follows.  First  we  derive  a  recursive  ellipsoid 
algorithm  which  can  be  used  for  the  general  model  described  above.  This 
is  very  similar  to  recursive  least  squares  and  is  a  simple  extension  of  some 
of  the  ellipsoid  algorithms  in  the  literature.  In  Section  3.2  we  study  the 
worst-case  asymptotic  properties  of  this  algorithm.  A  lower  bound  on  the 
worst-case  volume  is  derived  for  a  model  structure  whose  parametric  part 
is  linear  in  the  parameters.  We  next  show  that  using  a  random  binary 
sequence  input  will  shrink  the  worst-case  volume  to  this  lower  bound,  with 
probability  one.  and  using  a  Galois  sequence  input  will  surely  shrink  the 
volume  asymptotically  to  this  value. 

2  Background 

Various  models  have  been  used  in  the  formulation  of  the  SMID  problem. 
The  model  used  in  Fogel  [6]  is 

Vk  =  0Tdk  +  dk  with  (I^Hoo  <  8  (1) 

where  8T  =  [<?i  92---9m]  is  the  vector  of  unknown  parameters  and  6  = 
[j/fc-i  '  •  •  Vk-p  Uk  •  •  •  is  the  usual  regressor  vector. 
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The  set  of  parameters  consistent  with  the  single  observation  at  time  k 
is  defined  by 

Sk  =  {6e--m.\yk-QT6k\<t}  (2) 

and  the  set  consistent  with  all  observations  up  to.  and  including  time  k.  is 
defined  as 

0;  =  f]Si  (3) 

t=\ 

The  identification  goal  is  to  find  a  set.  0^,  at  each  time  k.  which  satisfies 
0*  c  0^  and  the  inclusion  is  as  tight  as  possible,  in  some  appropriate  sense. 
It  is  interesting  to  note  that  only  the  set  0^  is  updated  so  the  identified 
and  the  original  model  sets  have  identical  structure.  Furthermore,  the  size 
of  the  uncertainty  set  is  a  function  of  the  experiment.  This  means  that  dis¬ 
turbances  which  are  not  '‘worst-case"  actually  affect  the  size  of  parametric 
uncertainty.  This  can  be  seen  from  the  simple  example  where  only  one  pa¬ 
rameter  has  to  be  identified  (i.e..  y k  —  Qui;-\-dk).  In  this  case,  the  parametric 
uncertainty  set  is  an  interval  in  £.  The  worst-case  disturbance  is  obviously 
zero  and  one  can  see  that  a  disturbance  sequence  which  takes  on  values  -6 
and  +5  will  shrink  the  uncertainty  set  to  a  single  point. 

Clearly,  the  exact  solution  at  time  k  is  an  intersection  of  k  sets,  and 
each  set  is  defined  by  two  supporting  hyperplanes.  Much  of  the  research  has 
focused  on  these  exact  algorithms  [13, 14].  On  the  other  hand,  the  pointwise 
inequality  in  Equation  2  implies 

0*  C  Id  :  ~  9TQk)2  <  <52  ^2  ^k\  (4) 

l  k= l  k= l  ) 

where  Afc  >  0  are  free  parameters  to  be  picked  judiciously.  This  set  defines 
an  ellipsoid  and  much  of  the  work  over  the  last  15  years  has  focused  on 
efficient  computation  and  reduction  in  conservatism  of  the  above  bound  [6. 
7,  4].  In  particular.  Fogei  and  Huang  [7]  derive  equations  for  choosing 
the  parameters  At  to  achieve  minimum  volume  (determinant)  or  minimum 
trace  ellipsoids.  This  model  structure  is  not  very  useful  for  robust  control 
because  it  assumes  perfect  knowledge  of  the  plant  order  and  relative  degree. 
This  motivated  Younce,  Krause  and  others  [16,  10,  9]  to  consider  a  model 
with  unstructured  uncertainty.  The  following  is  a  model  set  with  additive 
unstructured  uncertainty. 

Mo  =  [Ge  +  WA  :  9  €  0O  C  Em,  HAjU  <  1}  (5) 
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where  is  a  SISO.  rational  transfer  function  whose  polynomial  coeffi¬ 

cients  are  elements  of  the  parameter  vector  6.  and  HT  is  a  known  (assumed) 
weighting  function  for  the  uncertainty.  The  process  set  is  simply  given  by 

y  =  h  *  u  and  h  £  Mo 

In  this  case  there  is  no  exact  characterization  of  the  parametric  uncertainty 
set  in  terms  of  polvtopes  since  the  pointwise  bound  |(Au)fc|  <  \uk |  does 
not  necessarily  hold,  and  a  more  complicated  exact  characterization  can  be 
derived  f  12].  The  approximate  characterization  can  be  expressed  in  terms 
of  ellipsoids. 


3  Approximate  Set  Membership  ID 


In  this  section  we  set  up  the  approximate  set  membership  identification 
problem  for  a  particular  model  structure.  We  state  the  recursive  equations 
for  the  noise-free  case  and  then  extend  these  to  the  noisy  case. 


Let  the  process  model  be  given  by: 


y  =  Ge,&u  +  —  d  where 
/l 


III 

<J 

Be  +  AW 
A 

■  \\d\\,„ 

VI 

l|A|| 

<  1  , 

m  = 

1  -f  ajz-1 

+  a2z~2 

i 

'  +  flm-1' 

„-m+l 

Be(z)  = 

+  ®\z  1 

4-  02z~2 

4.  .  . 

•  +  ^m-l 

_  —  m+1 

W(z)  = 

W0  +  W\Z~ 

1  4-  w2z~ 

■2  + 

*  *  *  T 

-iW- 

F(z)  = 

fo  +  hz~l 

+  hz  2 

•  +  fm- 1 

z-m+i 

(6) 

(7) 


We  also  define  /  =  [/o  /i  •  ■  •  fn]T  and  6  =  [do  6\  ■  ■  ■  We  next  show 

that  the  consistent  parameter  set  can  be  efficiently  bounded  by  an  ellipsoid. 
Let  %  =  yk  +  aiyk-i  +  •  •  •  +  =  {Ay)k, 

Uk  =  tt*o Uk  +  wxUk-i  +  ■  •  ■  +  Wm-iUk-m+i  =  {Wu)k  and 
< fik  =  [uk  Uk- 1  •  •  ■  uk-m+i}T.  Assuming  that  the  data  was  generated  by  such 
a  model,  at  each  time  k  we  have 

yk-0T<f>k  =  (AWu)k  +  (Fd)k  (8) 
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In  this  case,  weighting  the  significance  of  the  data  at  each  time  step  b\ 

\lk/2(yk  -  0Tdk)  =  A^2(A-itfc  +  (Af)fch  4  >  0  (9) 

as  is  done  in  Fogel  and  Huang  [7]  is  not  possible  since  the  inequality 


N  -V 

E(A u)f 

fc=i  *=i 

does  not  hold  pointwise.  By  writing  down  the  running  sums  we  get 


N  \  (  N 

E(y*-0T4)2  =  { £[(*«)* +  (Frf)*)l 

fc=i  J  U=i 


1/2 


(AT  1  x/2  f  .v 


1/2 


,fc=l  J 


.  k=\ 


Noting  that  (Fd)l  =  (/04  +  A4-i  +  •  •  •  +  /„4-n)2  <  <52||/!l?  we  get  the 
following. 

£(&  -  0Td>fc)2  <  where  (10) 

Jfc=l 

,v  f  "  1 1/2 

*?,  =  £ 4  +  «2«/lliAT  +  2<||/Iiijv'/2  24 


3.1  Recursive  Equations 

In  this  section  we  derive  recursive  equations  for  the  ellipsoid  matrix  as  well 
as  the  nominal  parameter  estimate.  We  begin  by  setting  6  =  0  and  thus 
\$2n  =  Ylk=i  “fc-  Equation  10  now  simplifies  to 

E(&  -  oT<t>k?  <  E 

k=i  fc=i 


For  the  sake  of  the  derivation  we  assume  that  Ylk=i  'Pk'Pk  invertible,  how¬ 
ever.  this  assumption  is  not  necessary  for  the  recursive  algorithm  to  work 
if  appropriate  initializations  are  made.  Expanding,  and  defining  I\v  = 
Zk=i  Ok<t>l,  =  Ef=!  i lkd>k  and  aN  =  E£U til  ~  “D,  we  get  the  Mow¬ 
ing. 

0Tr^0  -  2 9t(3n  +  <  0 
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Defining  6y  —  Fy3y.  1  y  =  Jy Ty Jy  —  oy  and  Fy  =  TyVy  we  get  the 
form  for  an  ellipsoid.  The  ellipsoidal  parameter  set  at  time  A  is  then  given 
by 

0.v  =  [9eRm  :  {9-eN)TpJ{0-e.v)<  1}  (ii) 

We  next  derive  recursions  for  9.  T  and  V.  which  requires  some  algebraic 
manipulation  and  the  matrix  inversion  lemma  (MIL): 

( A  +  BCD T1  =  .-T1  -  A~l  B(DA~l  B  +  DA~l 

The  recursion  for  Fy  is  just  a  simple  application  of  MIL  with  .4  =  T-1 . 
B  =  o.  C  =  1  and  D  -  oT . 

Fy+j  =  r  v*  -f  ©y+i  <Py+1 

r  r  r.y<py+i<pv+ir.y 

1  y+l  =  l  y  -  — — j - ■=— - 

1  +  ^y+F  y<py+i 

The  equation  for  ^y+1  is  simply 

0,V4-1  =  Ty  4-i/jy4-i  =  +  yN+l<?N+\) 


The  recursion  is  obtained  using  a  few  more  steps. 

8tf+l  =  Ty+i(j3y  +  yN+l^N+l) 

=  Ty+\{T~^9y  +  yy+i4>y+x) 

=  ry+i([Fy+i  -  Oy4-id>y+i]^y  +  yy+\(?y+\) 
=  9y  +  r.V+1(yy+1  -  d>y+10y  )d>y4-i 


The  recursion  for  Vy  requires  a  bit  more  work  and  turns  out  to  be 


t/  rr  ,  .-2  (VN+ 1  -  0yd>y+i) 
VN+i  -  Vy  +  u  v+1  -  T - p— -  • 

1  +  <Py4.1l  y<t>N+\ 

The  computations  for  the  noise-free  case  are  then  as  follows. 


Ty-i-i 

0.V4-1 

Vy+i 


Ty(f>y+i<f>Jf+lTy 

1  ,v - f - 

1  +  <t>y+iTy<t)y+\ 

9y  +  T y+\(yy+\  -  <t>J!+19y)<i>N+i 

is  ,  -2  (W+l  ~  0y<t>N+\)2 
V  V  i  U  \r  i  1  —  - - 

1  +  4>Jf+i^N<i>N+l 


(12) 

(13) 

(14) 
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The  eauations  for  the  noisy  case  can  now  be  derived  as  a  simple  extension 
of  the  above. 

In  the  general  case  we  still  get  an  ellipsoid  given  by 

-  20r0.v  +  a.v  <  o 


with  T  and  3  unchanged,  however,  a  is  now  given  by 


-*2v 


<:= i 

fc=  1 


1/2 


Noticing  that  the  recursions  for  T  and  #;v  do  not  depend  on  a.  the  recursions 
for  the  noise-free  case  hold  here  as  well.  The  recursion  for  V,y  can  be 
obtained  easily  from  the  noise-free  case  by  simply  noting  that 


Vjy+i  -  Vjv  =  (0n+iFn+\0N+i  ~  3nTn3n  ~  i/jv+i)  +  ^ 


/N+ 1 


1/2 


N 


1/2- 


2S\\f\u  |  TivTT  £  4 


£«J 


\fc=sl 


*Vfc=l 


The  part  in  the  parentheses  is  equivalent  to  the  noise-free  case  and  we  end 
up  with  the  following.  Setting  k0  =  0.  at  time  N  +  1  we  have 


*H+ 1 
V/v+i 


,  -2 
+  u.v+l 


F/v  - 


“W+l 


(jW+1  -  0y<f>N+l)2 

1  +  4>lf+x^  n<Pn+\ 

\  +  2*11/11!  {VN  +  l4h  - 


(15) 

(16) 


3.2  Worst  Case  Behavior  of  Ellipsoidal  Algorithms 

In  this  section  we  consider  the  worst  case  asymptotic  behavior  of  the  el¬ 
lipsoid  algorithm  derived  in  the  previous  section.  We  first  derive  a  lower 
bound  on  the  worst  case  achievable  volume  and  then  show  that  by  using 
a  random  binary  sequence  as  an  input,  this  lower  bound  can  be  achieved 
asymptotically  in  time. 

Let  the  process  be  modeled  by 

ijk  —  ^ 3k  +  (eAu)fc  +  ( Fd)k  ,  Halloo  &■>  !|A||<x>  <  1  (If) 
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The  next  result  shows  that  given  any  input  llujl^  <  1.  there  is  a  fundamental 
worst-case  lower  bound  on  the  volume  of  the  identified  ellipsoids.  We  derive 
a  lower  bound  on  the  determinant  of  the  ellipsoid  matrix,  which  is  propor¬ 


tional  to  the  square 
given  by 


of  the  volume  of  the  ellipsoid.  The  exact 


Yol(  E\- )  = 


~m/,2\/det(  P\ ) 
r(m/‘2  4-  1  i 


relationship  is 
(18) 


where  f  is  the  Gamma  function  and  for  any  integer  n.  r(n  +  1)  -  n!  and 

r(  1/2)  =  >/5F. 


Theorem  3.1  Assuming  that  the  true  plant,  g.  is  in  M.  and.  the  ellipsoids 
are  computed  according  to  Equations  12  through  16. 


inf  sup  sud  det(Pv)  >  {(  4  <*>l|/||i)2m 

!lu||co<i  !!4!|oo'<^ 


Proof.  Assuming  that  the  original  ellipsoid  is  centered  at  zero  (no  loss 
in  generality)  and  is  large  enough,  one  can  easily  show  that  the  worst  case 
situation  occurs  when  0piant  =  0.  A  =  0.  and  d  =  0,  which  implies  that 
y  =  0.  Subsequent  ellipsoids  are  then  given  by 

QN  =  {0€Rn  :  6tP^0  <  l}  (19) 


where  Py  =  VyTy,  T^1  =  Y.k'PkQk  an<^ 


N 


N 


1/2 


Vy  =  Ar<52||/||i  4  2e^||/||i\/A"  (  '22  Uk 


k= o 


\k-0 


We  will  actually  show  that 

[det(PlV)F  =  P/v  [det(TJv)i';‘  >  (€  +  ^l!/|ji)2- 

Since  rj  =  £*  d>k<pj  and  4>k  =  K  •  •  •  uk.m+l}T .  it  is  true  that 

N-j+l 

y:  u\  for  all  1  <  j  <  m  . 

k= 1 

It  is  also  true  that  T^}1  is  positive  semidefinite  since  for  any  x  6  Rm 

XT  0kd>k')  x  =  22  xT<b^Tkx  =  22c*-° 

\fc=l  /  k-l  k= 1 
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We  can  now  appiy  Hadamard's  Inequality  [8]  to  F  1  to  show  that 
m  m  ( N—j+ 1  \  [  S  \m 

Using  the  above  facts  we  now  have  the  following  sequence  of  inequalities. 

N62\\f\\l  +  2y/jW>cjl/|]i  (gtei  “fc)1/2  ~  Etet  4 

[det  (£LV<W)]~ 

ns*  u/115+2^11/111  (e*Li  4)1/2  -*2nv=  i  4 

>  inf  - =^v - -  “ 

Mioo<i  EU i  uf- 

>  inf  *$SM  +  inf 


inf  [det(Pv/)]m  =  ,,  j,nf 


“lloo<l 


IMIoo<l 


EiL,  4  '  ihi-<i  (£« ,  „|) 
>  (f  +  «ll/lli)2 


o\!/2 


□ 

We  now  show  that  this  volume  can  be  achieved  if  a  special  input  is  used. 
Let  {ufc}  be  a  random  binary  sequence  obtained  by  a  series  of  independent 
Bernoulli  trials  where 

Prob{iifc  =  -1}  =  Prob{ufc  =  +1}  =  1/2 
The  main  result  is  captured  in  the  following  theorem. 

Theorem  3.2  If  u  is  random  binary  sequence,  then 

lim  sup  sup  det(Pjv)  =  (e  +  ^ll/l|i)2m  ;-P- 

*>5€Mo||<i||oo<5 

Proof.  We  now  consider  T  more  carefully.  In  particular,  recalling  that 
<jjfc  =  [uk  Uk-i  ■  ■  ■  we  explicitly  write  T  in  terms  of  the  input  sequence 

as 

N 

(iy). .  =  y;  Uk-j+i  uk~]+i 

v  >x']  si 

It  is  apparent  that  for  all  i  <  m 

(riVl)  i  ■  =  Uk-i+ 1  =  N  ~  1 
’  tel 


9 


since  Uk  =  0  for  k  <  0.  Now  we  show  that  T  1  becomes  diagonally  dominant. 
First,  notice  that  taking  expectations  we  get 

f  N  'j  .V 

e  <  uk-i+iuk-j+i  y  =  ^  { ufc-‘+i  } 

U= i  J  fc=i 

.V 

=  ^  =  (Ar  -  maxi 

fc=max(t,;) 

which  means  that  the  expectation  of  T_1  is  diagonal.  We  must  now  show 
that  the  sequence  of  random  variables  {ujt-i+i  Ufc-j+i}fc  are  independent. 
To  see  this,  we  first  note  that  for  a  r.v.  u,Uj,  fixing  ut  does  not  change  the 
pdf  and  we  get 

p(U,U,\Uj  =  1)  =  p(  U^Uj  j  Uj  =  -1)  =  Puiu<uj) 

where  pu  is  the  pdf  for  each  of  the  Uk  s.  Assuming  that  i  <  j  (symmetry) 
and  defining  q  =  j  -  i,  we  rewrite  the  above  sum  in  the  simpler  form 

N-j+ 1 

(r^)  =  £  uk+*uk 

Now  it  is  clear  that  for  k  <  q  +  1  the  elements  are  all  independent.  When 
k  =  q  +  1,  we  are  summing  the  two  r.v.’s  U2q+iUq+\  and  u7+iui-  But 
since  it2g+i  and  ui  are  independent,  one  can  see  that  u2q+ i^+i  and  u1+iu\ 
are  also  independent.  We  can  use  the  same  argument  throughout  the  sum 
and  show  that  {uk+qUk}k  is  a  sequence  of  independent  Bernoulli  random 
variables  taking  on  values  in  {-l.+l},  each  with  probability  1/2.  Each  has 
a  mean  of  zero  and  a  -  1.  The  variance  for  the  sum  is  then  given  by 

a  =  s/N  -  j  +  1  <  y/N 


Now  we  can  use  Chebyshev’s  Inequality  to  show  that  for  i  ^  j,  any  a  >  0 
and  e  >  0 

fa'),, 


Pr< 


N^+a) 


>  e>  < 


N 


1 


-  e2jV(l+2 a)  e2N2a 


which  shows  that  for  each  i  j,  the  sequence  of  random  variables 

f(r»-,)„A 


n> 
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converses  to  zero  in  probability  (i.p.).  and  some  subsequence  also  converges 
with  probability  1  (w.p.l)  [3]. 

This  means  that  for  any  0  <  a  <  1/2. 


-i—  r^1  -diaglA'^"*) . N^~a))  i-p. 

y(2+a> 


and  so 


det 

This  establishes  that 


-i_ry)  -  **-•-  i.p. 


det(Fv)  =  Vy  det(r.v)  -  (e  +  «5||/||1)2m  i.p 


□ 

We  now  derive  a  deterministic  asymptotic  optimal  input  result.  Consider  a 
periodic  Galois  sequence  having  period  r  =  2n  —  1.  This  can  be  generated 
with  n  shift  registers  and  s  <  n  (modulo  two)  adders,  for  example  [11].  A 
periodic  Galois  sequence  “looks”  like  a  random  binary  sequence  in  the  sense 
that  the  autocorrelation  functions  are  very  similar.  Thus,  one  would  expect 
a  similar  asymptotic  result  to  hold  in  this  case. 

Theorem  3.3  Let  the  input  be  a  Galois  sequence  of  period  r .  Defining 
N  =  qr  +  p  with  0  <  p  <  r, 

lim  sup  sup  det(Fiv)  =  (c  +  ^||/lli)2m 
qZZaeMo  HU  <5 


Proof.  We  first  note  that  (T^l),,j  is  simply  (N  -  j)  times  the  autocorre¬ 
lation  function  of  the  input  from  1  to  N  —  j,  evaluated  at  j  —  i.  Now  use 
the  fact  from  [11]  that  for  u,  a  Galois  sequence  of  period  r,  and  any  integer 


q  >  0 


v 

k=l 


qr  1  =  0 
-q  1  <  /  <  r  -  1 


Note  that  when  /  ^  0,  we  have  ukuk+l  =  ~1*  Thus,  for  any  0  < 

p  <  r  -  1,  we  have  an  immediate  bound  ]  ukuk+i  I  <  9  +  1  +  i-  The 

result  follows  after  using  the  above  facts  and  an  arguments  similar  to  the 
proof  of  the  previous  theorem  which  shows  that  the  diagonal  entries  of  T  v 
are  0{qr),  while  the  off-diagonal  entries  are  at  most  0(r)  +  0{q).  □ 
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The  above  results  hold  for  the  special  model  structure  given  by  Equa¬ 
tion  17  t'i.e..  IV  =  i ).  For  the  more  general  structure  we  can  only  get  an 
upper  bound  for  the  uncertainty  given  optimal  inputs.  This  is  given  by  the 
following  result. 

Corollary  3.4  For  the  model  given  by 


y  = 


B(6) 

A 


u  +  A  Wu 


d 


inf  sup  sup  detfP.y)  <  (j|.4||i(]|VE||i  -  ^))2m 
IMI«<i  geMWWooKS 


(20) 


4  Some  Examples 

We  now  consider  two  simple  examples  which  illustrate  the  convergence  of 
the  algorithm  when  optimal  inputs  are  used.  The  model  is  FIR  with  two 
parameters  and  is  given  by 

t/fc  =  #0 Ufc  T  +  (Au)fc  +  dfc 

where  j|A||oo  <  0.05,  ||d|| «,  <  0.1,  and  HU  <  1.  In  the  first  example 
Q  —  o.  A  =  0,  and  d  —  0.  This  corresponds  to  the  worst-case  situation. 
The  input  is  taken  as  a  random  binary  sequence  in  {-l.+l}.  The  ellipsoid 
is  initialized  to  10007.  but  shrinks  significantly  after  the  first  iteration.  A 
plot  of  the  volume  of  the  identified  ellipsoids  (for  iterations  2-100)  is  shown 
in  Figure  1. 

In  the  second  example,  the  true  plant  is  6T  =  [1.0  0.2]  and  A  =  0.  The 
disturbance  d  is  chosen  as  a  sequence  of  i.i.d.  uniform  random  variables  ( 
in  [-0.1. 0.1])  and  the  input  is  the  random  binary  sequence.  The  parameter 
estimate  is  initialized  to  zero  and  the  ellipsoid  is  initialized  to  10007.  Fig¬ 
ure  2  shows  the  second,  fifth,  and  tenth  ellipsoids.  The  error  between  the 
estimate  (ellipsoid  center)  and  the  true  plant  is  plotted  in  Figure  3  while 
the  volume  is  shown  in  Figure  4.  Finally,  the  algorithm  is  run  with  an  input 
which  is  taken  from  a  uniform  density  and  the  convergence  rate  is  compared 
in  Figure  5.  For  iterations  <  15,  the  volumes  are  drastically  different  and 
cannot  be  compared  at  one  scale  without  complete  loss  of  detail. 


5  Conclusion 

In  this  paper  we  have  presented  a  simple  recursive  approximate  set  mem¬ 
bership  identification  algorithm.  The  model  used  was  parametric,  linear 
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Figure  1:  Worst-Case  Convergence  of  Ellipsoid  Volume 


Figure  2:  Evolution  of  Ellipsoids 
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Figure  3:  Convergence  of  Central  Estimate 


Figure  4:  Optimal  Convergence  of  Ellipsoid  Volume 
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Figure  5:  Convergence  with  Binary  and  Arbitrary  Inputs 


in  parameters,  and  combined  with  nonparametric  uncertainty  and  output 
additive  magnitude  bounded  noise.  The  worst-case  asymptotic  behavior  of 
this  algorithm  was  studied  in  terms  of  the  volume  of  uncertainty  sets.  Fur¬ 
thermore,  it  was  shown  that  there  exist  inputs  which  can  guarantee  that 
the  volume  of  uncertainty  sets  shrink  to  this  theoretical  lower  bound  despite 
worst-case  conditions.  A  direction  for  future  research  is  to  extend  this  type 
of  analysis  to  more  complex  model  structures. 
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Abstract 

In  this  paper,  the  problem  of  asymptotic  identification  for  fading  memory  systems  in  the 
presence  of  bounded  noise  is  studied.  For  any  experiment,  the  worst-case  error  is  characterized 
in  terms  of  the  diameter  of  the  worst-case  uncertainty  set.  Optimal  inputs  that  minimize 
the  radius  of  uncertainty  are  studied  and  characterized.  Finally,  a  convergent  algorithm  that 
does  not  require  knowledge  of  the  noise  upper  bound  is  furnished.  The  algorithm  is  based  on 
interpolating  data  with  spline  functions,  which  are  shown  to  be  well  suited  for  identification  m 
the  presence  of  bounded  noise;  more  so  than  other  basis  functions  such  as  polynomials. 
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1  Introduction 


Recently,  there  has  been  an  increasing  interest  among  the  control  community  in  the  problem  of 
identifying  plants  for  control  purposes.  This  generally  means  that  the  identified  model  should 
approximate  the  plant  in  the  operator  topology,  since  this  allows  the  immediate  use  of  robust 
control  tools  for  designing  controllers  i2.  5].  This  problem  is  of  special  importance  when  the  data 
are  corrupted  with  bounded  noise.  The  case  where  the  objective  is  to  optimize  prediction  for  a 
fixed  input  was  analyzed  by  many  researchers  (see  [10]  and  the  references  therein).  The  problem 
is  more  interesting  when  the  objective  is  to  approximate  the  original  system  as  an  operator,  a 
problem  extensively  discussed  in  [20] .  especially  when  the  plant’s  order  is  not  known  a  priori.  For 
linear  time  invariant  plants,  such  approximation  can  be  achieved  by  uniformly  approximating  the 
frequency  response  (JToo-norm)  or  the  impulse  response  norm).  In  H ^  identification,  it  was 
shown  that  robustly  convergent  algorithms  can  be  furnished,  when  the  available  data  is  in  the  form 
of  a  corrupted  frequency  response,  at  a  set  of  points  dense  on  the  unit  circle  ([8,  6.  7]).  When  the 
topology  is  induced  by  the  tx  norm,  a  complete  study  of  asymptotic  identification  was  furnished  in 
[18]  for  arbitrary  inputs,  and  the  question  of  optimal  input  design  was  addressed.  Input  design  has 
been  addressed  in  stochastic  settings  (e.g.  [11,  21]  ),  but  not  in  worst-case  settings.  Related  work 
on  the  worst-case  identification  problem  was  also  reported  in  [13.  14.  12,  15.  3.  9]. 

In  this  paper,  the  work  of  Tse  et  al  [18]  is  extended  to  analyse  the  worst-case  asymptotic 
identification  of  nonlinear  fading  memory  systems.  As  in  [18],  the  study  is  done  in  two  steps.  The 
first  step  is  concerned  with  obtaining  tight  upper  and  lower  bounds  on  the  optimal  achievable  error 
by  any  identification  algorithm.  The  bounds  are  functions  of  the  input  used  for  the  experiments, 
and  this  can  be  arbitrary.  The  second  step  is  then  to  study  these  bounds  and  characterize  the 
inputs  that  will  minimize  them.  In  particular,  simple  topological  conditions  are  furnished  that 
guarantee  the  existence  of  an  algorithm  with  a  worst-case  error  within  a  factor  of  two  from  the 
lower  bound.  An  near  optimal  input  is  characterized  so  that  the  worst-case  error  is  within  a  factor 
of  two  of  the  bound  on  the  noise. 

It  is  noted  that  for  the  results  on  arbitrary  experiments,  the  suggested  optimal  algorithms  are 
tuned  to  the  knowledge  of  the  bound  on  the  noise.  If  however,  the  near  optimal  input  is  used,  then 
an  untuned  algorithm  can  be  provided  that  results  in  a  worst-case  error  equal  to  the  noise  bound, 
6.  Such  an  algorithm  is  based  on  interpolating  data  bv  spline  functions  of  several  variables. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2  gives  a  formal  definition  of  nonlinear 
fading  memory  systems.  Section  3  describes  the  identification  set-up.  Section  4  characterizes  the 
asymptotically  optimal  algorithms  and  the  associated  optimal  worst-case  errors  for  a  given  input. 
The  problem  of  optimal  inputs  is  addressed  in  Section  5.  An  optimal  untuned  algorithm  is  developed 
in  Section  6.  Section  7  contains  our  conclusions. 

2  Fading  Memory  Systems 

Let  U  be  the  set  of  one-sided  infinite  sequences  whose  foo  norm  is  bounded  by  1.  This  can  be 
viewed  as  the  input  set  which  contains  the  possible  inputs  that  can  be  used  for  performing  the 
identification  experiments.  We  consider  the  set  of  models  X  as  discrete-time,  causal  functions  from 
U  to  SR°°;  a  plant  h  £  X  takes  as  input  a  sequence  u  =  (u0,  Ui,...)  to  give  an  output  sequence 
(/io(u),/ii(u),...).  We  assume  that  h  £  X  further  satisfies  the  following  properties: 

1.  hn(u)  depends  continuously  on  u0,...,Un-i* 
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2.  h  has  equilibrium-initial  behavior: 

fcn+1(0u)  =  Mu)  for  all  n, 

where  Ou  is  the  input  0.  u0,  u.  ....  (In  general,  we  will  use  the  notation  ru  for 
i  e  first  apply  the  finite  sequence  r,  then  w.  Since  we  are  dealing  with  causal  systems,  we 
shall  slight  abuse  the  notation  and  write  M»>  to  mean  MM  where  u  is  any  infinite 
sequence  the  first  n  elements  of  which  are  given  by  the  finite  sequence  w.) 

3.  h  has  fading  memory  (FM):  for  each  £  >  0  there  is  some  T  =  T(e)  such  that  for  every  *, 
every  t  >  T  and  every  finite  sequences  v  €  [-1, 1)  ,  w  6  l  i,  ij  , 

\ht+k{vw)  -  ht{w)\  <  £ 

To  measure  the  ideteiftcation  error,  we  shall  consider  the  metric  ,  to  be  the  one  corresponding 
to  the  operator  gain:  =  sup  ||AM|U  . 

u£U 

It  can  be  seen  that  systems  in  X  satisfying  the  above  property  necessarily  must  have  bounded 
, Teain  This  is  a  good  norm  to  consider  for  robust  control  apphcat.ons.  However,  it  should 

r:::;Mhis„o.m~ 

anupp^r'bound  Tthe  Inrphtudes  of  input  signals  has  to  be  known  apriori.  In  the  above  detention, 
this  bound  is  normalized  to  one. 

Examples  of  FM  Systems: 


Stable  LTI  systems. 

For  each  h  G  l\  consider  the  input /output  map  u 
above  conditions.  The  operator-induced  norm  in 


hU4|i,  It  is  clear  that  these  systems  satisfy  the 
this  case  is  just  the  l\  norm. 


Hammerstein  systems. 
These  are  systems  which  are 
nonlinear  element: 


formed  by  composition  of  a  stable  LTI  system  followed  by  a  memoryless 
yn  =  g{{u*h)n) 


for  some  h  €  h  and  some  continuous  function  S  :  »  -  K.  K  18  easlly  to  veT1 ; 
satisfy  the  first  two  conditions  above.  If  we  assume  further  that  g  is  uniformly 

can  be  seen  that  the  system  also  has  fading  memory. 

For  further  details  on  fading  memory  operators,  see  [1,  16]. 


that  these  systems 
continuous,  then  it 


3  Identification  Setup 


The  plant  to  be  identified  is  known  to  be  in  a  model  set  M  C 
set  U.  We  assume  that  the  observed  output  y  is  corrupted  by 
is  unknown  but  magnitude-bounded,  ||d||oo  <  ^  i-e-  if  h  15  the 


X.  An  input  u  is  selected  from  the 
some  additive  disturbance  d  which 
system,  then 


y  =  h(u)  +  d  . 
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An  identification  algorithm  is  a  sequence  of  mappings  o  =  {<M  generating  at  each  time 
an  estimate  cn(Pnu,Pny)  G  X  of  the  unknown  plant.  Here.  Pn(u0,  ult . . . ,  u„,  un+i, .  •  ■)  = 
(u0 . un,  0.  0 _ )  is  the  truncation  operator. 

Given  an  identification  algorithm  and  a  chosen  input,  we  would  like  to  consider  the  limiting 
situation  when  longer  and  longer  of  the  output  sequences  are  observed.  To  this  end.  the  worst  case 
asymptotic  error,  (<f>,u,6),  of  an  algorithm  o  is  defined  as  the  smallest  number  r  such  that  tor 
all  plants  h  £  M  and  for  all  disturbances  d  with  ijdjjoo  <  6. 

limsup  || <f>(Pnu,  Pn(u  *  h  +  d))  -  h\\x  <  r 

n— *oo 

Equivalently. 

e00(<^>,  u.6)  =  sup  sup  limsup  ||<£(P„ti.  P„(u  *  h  +  d))  -  h\\x 
heM  ||d!loo<^ 

'  The  interpretation  of  this  definition  is  that  no  matter  what  the  true  plant  and  the  disturbances 
are.  the  plant  can  be  eventually  approximated  to  within  e »(<£,  u.  <5),  using  the  estimates  generated 
by  the  identification  algorithm.  The  convergence  rate  may  depend  on  the  plant  and  noise,  i.e.  tor 
a  given  £  there  exists  some  N(d,h,e)  so  that 

||^n(y)  -  h\\x  <  (<P,U.6)  +  : 

whenever  n  >  N.  We  say  that  the  convergence  is  uniform  if  N{y,  h,  e)  depends  onlv  on  £.  For  more 
motivations  and  discussions  on  these  definitions,  see  [18]. 

The  optimal  worst-case  asymptotic  error  Eoo(u,6)  is  defined  as  the  smallest  error  achievable  by 
any  algorithm: 

E00(u,6)  =  'mfe00(<l>,u,6) 

<t> 

Any  algorithm  for  which  the  infimum  is  attained  is  said  to  be  asymptotically  optimal.  We  will  obtain 
a  general  characterization  of  the  asympotically  optimal  algorithms  and  the  resulting  optimal  error, 
for  any  given  input  it.  We  will  then  find  conditions  on  the  input  it  to  make  this  optimal  worst-case 
asymptotic  error  small. 

4  Asymptotically  Optimal  Identification 

The  characterization  of  asymptotically  optimal  algorithms  and  optimal  asymptotic  errors  is  in 
terms  of  the  uncertainty  set ,  an  important  notion  in  information-based  complexity  theory.  The 
uncertainty  set  5„(u,  y,  6)  at  time  n  is  the  set  of  all  systems  in  the  given  model  set  M  which  are 
consistent  with  the  observed  data  up  until  time  n: 

Sn(u,y,6)  =  {h£M:  \\Pn(y  -  M«))IU  <  *} 

These  are  the  plants  which  can  give  rise  to  the  observed  output  for  some  valid  disturbance  sequence. 
The  infinite-horizon  uncertainty  set  is 

Soo(ti,  y,  6)  =  {h£  M:\\y-  Mu)ll«>  <  *} 

For  a  given  set  A  C  X,  define  the  diameter  of  the  set  as: 

diam(A)  =  sup  ||p  -  h\\x 

g,h€A 
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and  let  1 D(u.d)  be  the  diameter  of  the  worst-case  infinite-horizon  uncertainty  set: 

D(u.S)  =  sup  sup  diam(  Soo(“5  u  *  h  +  d,  6)) 
h£M  Halloo <s 

Under  appropriate  topological  conditions  on  the  model  set.  this  quantity  characterizes  'he 
optimal  asymptotic  worst-case  error.  The  following  result  is  a  generalization  of  Proposition  .<.3. 
Theorem  3.4  and  Proposition  3.9  in  the  LTI  case  ([18])  to  our  present  setting. 

Theorem  4.1  If  the  model  set  M  C  X  is  cr-compact  (i.e.  M  is  a  countable  union  of  compact 
sets),  then 

£^1<E00(u,S)<D(u,6) 

Furthermore,  if  M  is  compact .  then  the  convergence  can  be  made  uniform. 

In  the  cr-compact  case,  an  algorithm  achieving  an  asymptotic  error  within  the  above  bounds  can 
be  realized  using  the  principle  of  Occam's  razor.  Let  M  =  U iMi,  where  the  Mi’s  are  compact  and 
increasing.  This  decomposition  gives  a  complexity  index  to  each  plant  in  M.  as  the  index  of  fhe 
smallest  Mi  containing  the  plant.  At  each  time  n,  the  algorithm  simply  returns  as  an  estimate  ?nv 
plant  in  the  uncertainty  set  Sn  with  the  smallest  complexity  index.  Note  that  since  the  disturbance 
bound  6  is  required  to  compute  the  uncertainty  set,  this  algorithm  is  tuned  to  this  information.  On 
the  other  hand,  if  M  is  compact,  one  can  use  an  algorithm  which  simply  returns  the  plant  in  M 
which  fits  best  the  input /output  data  observed  so  far.  This  algorithm  attains  an  asymptotic  error 
within  the  above  bounds  with  a  uniform  rate  of  convergence.  It  is  also  untuned  to  the  disturbance 

bound  S. 

A  slight  extension  of  the  above  result  yields  essentially  the  same  bounds  for  the  case  when  M 
is  separable.  The  proof  is  along  the  same  lines  as  the  proofs  of  Lemma  4.5  and  Proposition  4.6  in 
[18].  The  optimal  algorithm  has  roughly  the  same  structure  as  that  for  the  cr-compact  case. 

Theorem  4.2  If  M  is  separable,  then 

<  E^S)  <  lim D(u,x) 

2 

To  apply  the  above  results,  we  now  look  at  the  topological  structure  of  some  classes  of  fading 
memory  systems  under  the  operator-induced  norm. 

Consider  first  the  class  of  stable  LTI  systems.  Since  this  corresponds  to  the  space  lu  which  is 
separable,  Theorem  4.2  is  applicable  in  this  case.  More  generally,  we  can  in  fact  prove: 

Theorem  4.3  The  class  of  all  fading  memory  systems  is  separable. 

Proof.  Define  the  class  of  pth-order  memory  systems,  Mp,  to  be  the  set  of  all  /  such  that  for 
every  k  and  for  every  t  >  p  and  every  finite  sequences  v  G  [-l,l]fc,w  G  ft+k{w)  -  fd™)- 

It  is  clear  that  any  fading  memory  system  can  be  approximated  (in  the  operator-induced  norm) 
arbitrarily  closely  by  a  pth-order  memory  system  for  sufficiently  large  p.  Hence  it  suffices  to  prove 
that  Mp  is  separable  for  all  p. 

Now  given  any  /  G  Mp  we  can  find  some  continuous  function  g  :  [-1,  l]p  -»  5?  such  that  for  all 
time  n,  and  all  input  u, 

/n(^)  =  pi  ■  •  ^n— 1 ) 
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We  call  <7  the  memory  function  for  /.  Hence  we  have  that  l|/||  —  where  the  infinity  norm  is 

taken  over  [-l.l]p.  But  the  space  of  continuous  functions  with  the  uniform  topology  induced  by 
the  foe-norm,  denoted  hv  C([- 1,  l]p),  is  separable,  and  hence  so  is  Mp.  M 

This  means  that  when  we  look  at  fading  memory  system,  we  can  apply  Theorem  4.2.  and 
reduce  the  analysis  of  the  asymptotic  optimal  error  to  the  analvsis  of  the  worst- case  infinite-horizon 

diameter. 


5  Optimal  Inputs 

We  now  turn  to  the  question  of  optimal  inputs,  i.e.  inputs  u  that  minimize  the  worst-case  infinife- 
horizon  diameter  D{u,6).  First  we  state  a  simple  lower  bound.  Let 

p(A4 )  :=  sup{r  |  0  <  <  r  =>  3  g,  h  E  Ad  with  l|p  —  hi  I  =  r  }  . 

Note  that  if  M  is  path  connected,  then  p(M)  =  diam(A/(). 

Lemma  5.1  If  26  <  p(M),  then  D(u.S)  >  26  for  ail  u  6  U. 

Proof.  See  [18]  * 

Since  p[M)  >  26  for  most  of  the  reasonable  model  sets,  the  above  result  gives  a  general  lower 
bound.  We  now  investigate  how  to  choose  an  input  which  achieves  this  bound. 

Recall  that  M  is  balanced  if  h  E  M  implies  -h  E  M.  For  balanced  and  convex  model  sets,  it 
is  well  known  from  information-based  complexity  theory  [17]  that  the  worst  case  diameter  is  equal 
to  the  diameter  of  the  uncertainty  set  when  the  output  is  identically  equal  to  zero.  The  following 
lemma  summarizes  this. 

Lemma  5.2  Assume  that  M  is  balanced  and  convex.  Then,  for  all  u  E  U,6  >  0, 

D(u,6 )  =  diam(5oo(u,  0. 6)) 

Call  an  input  u  E  U  persistently  exciting  for  Ai  if  the  following  property  holds: 

||h(u)ll°o  =  \\h\\x  V  h  E  M  . 

The  following  result  says  that  persistently  exciting  inputs  are  optimal. 

Theorem  5.1  Assume  M  is  balanced  and  convex. 

L  If  the  input  u  is  persistently  exciting,  then  D(u,6)  <  26  for  all  6  >  0. 

2.  If  u  is  persistently  exciting  then  D(u,6)  =  2£  for  each  0  <  6  <  , 

Proof.  (1):  By  Lemma  5.2,  for  all  6  >  0, 

D(u,6)  =  2  sup{||h||  |  h  E  M,  ||^(«)||oo  <  $}  • 
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Pick  any  h  <E  M  such  that  p(u)||00  <  6.  Eti  is  persistently  exciting,  this  means  that  also  \\h\\x  S  S, 
so  D{ u.  6)  <  26. 

(2)  From  Lemma  5.1,  D(u,6)  >  26  for  such  6.  The  result  follows  from  (1)  above.  * 

It  follows  from  Theorem  4.2,  Theorem  4.3.  Lemma  5.1  and  the  above  theorem  that  one  can 
achieve  nearly  optimal  asymptotic  identification  for  the  entire  class  of  fading  memory  systems  if 
we  use  a  persistently  exciting  input. 

Corollary  5.3  Let  M  =  X,  the  class  of  all  fading  memory  systems.  Then  for  any  identification 
algorithm  <p  and  any  input  u.  the  worst-case  asymptotic  error  u.  6)  is  lower  bounded  by  6.  If 

u  is  persistently  exciting,  then  there  is  an  algorithm  which  can  achieve  an  asymptotic  error  of  less 

than  26. 


A  natural  question  which  arises  at  this  point  is  whether  persistently  exciting  inputs  exist.  In 
the  stable  LTI  case,  this  was  shown  to  be  the  case  [18].  The  next  theorem  shows  that  they  also 
exist  when  the  model  set  consists  of  nonlinear  fading  memory  systems. 

Theorem  5.2  Let  the  model  set  M  be  some  subset  of  the  set  of  fading  memory  systems.  Let  W 
be  any  countable  dense  subset  of  [-1,1]  and  consider  any  input  u>0  6  [-M]°°  which  contains  all 
possible  finite  sequences  of  elements  of  W .  Then,  w0  is  persistently  exciting. 

Proof.  Assume  that  h  €  M,  \\h\\  =  K  <  oo.  Pick  any  £  >  0.  Let  T  =  T(e)  as  in  the  definition  of 
FM.  By  definition  of  the  sup  norm,  there  is  some  u>  and  some  Tx  so  that 

sup  |/it(u>)|  >  K  —  £  . 

0<t<Ti 

Using  the  equilibrium-initial  assumption  and  replacing  u>  by  0ru>  and  Tx  by  T  +  Tx,  we  may  assume 
that 

sup  |ht(w)|  >  K  -  e  . 

T<t<Ti 

By  density  of  W  and  continuity  of  ht(w)  on  past  values  of  u>,  we  may  further  assume  that 
w(0),  -  1)  take  values  in  W.  From  the  construction  of  w0,  there  is  some  k  so  that 

ajo(Jfe)  =  w(0),  u0(k  +  l)  =  u>(l),  ...,  u>0(k  +  Tx  -  1)  =  w(Ti  -  1)  • 

Let  t>  be  the  finite  sequence  wo(0), w0(l ~  1)  and  w  be  the  ^te  sefluence 


ai(0),u)(l),...u)(Ti  -  1), 

which  is  equal  to  ««(*), w0(*  +  +  Tx  -  1).  So  tw  is  the  same  as  the  first  Tx  +  k  -  1 

elements  of  uiq. 

By  the  FM  property  applied  to  these  inputs,  we  have  that 

\ht+k(vw)  -  ht{w)\  <  £  for  each  t  >  T 


(using  the  notational  convention  mentioned  above  for  h,{w)  if  the  length  of  w  is  larger  than  s). 
Then  for  such  t, 

|ht+fe(u>o)|  =  |/it+fc(tuo)|  >  \ht{w)\-£, 


||h(w0)||  >  sup  ]hr(wo)|  >  K  -2e  . 

T+JKrCTi+fc 

Thus,  we  conclude  that  K  =  \\h\\  >  ||M“>o)l!  >  K  -  2e  for  all  £  >  0,  so  \\h(u,0)\\  =  K  as  wanted.  * 
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6  An  Untuned  Algorithm 


As  remarked  earlier,  the  asymptotically  optimal  algorithms  for  ^-compact  and  separable  model 
sets  are  tuned  to  the  knowledge  of  6.  the  bound  on  the  disturbance.  It  will  be  shown  that  tor 
fading  memory  systems,  one  can  achieve  asymptotically  optimal  identification  without  knowing  6, 
provided  that  we  use  a  persistently  exciting  input.  This  is  in  fact  a  generalization  of  a  result  by 
Makila  [13],  which  was  proved  in  the  context  of  stable  LTI  systems. 

We  shall  make  use  of  multivariate  piecewise  linear  spline  functions  to  interpolate  between  the 
measured  data  to  form  an  approximation  to  the  unknown  plant.  This  is  a  generalization  of  the 
univariate  linear  spline,  but  because  in  higher  dimension  there  is  no  natural  ordering  of  the  data 
points,  the  description  of  the  interpolant  is  more  complicated. 


Consider  the  cube  I  =  [—1,1]'*  C  Kp.  Let  xux2 ,...,xm,m  >  p  be  m  points  in  the  interior 
of  the  cube.  We  wish  to  construct  a  continuous,  piecewise  linear  function  /  :  I  —  5?  such  that 
f(Xi)  =  y^i  -  1,2,...,  m,  where  the  yps  are  given  data  values  to  interpolate. 

To  facilitate  the  discussion,  we  need  to  first  define  several  basic  geometric  concepts.  A  p- 
dimensional  simplex  S  in  5?p  is  the  convex  hull  of  p  +  1  affinely  independent  points.  Each  of  these 
points  is  a  vertex  of  5.  The  convex  hull  F  of  any  subset  of  these  p  4-  1  points  form  a  face  of  c  if 
there  exists  a  hvperplane  H  such  that  5  lies  entirely  on  one  side  of  H  and  S  n  H  =  F.  If  F  is  the 
convex  hull  of  p  points,  it  is  called  a  facet.  A  point  v  outside  S  is  said  to  be  separated  from  S  by 
a  face  F  if  v  and  5  lie  on  the  opposite  sides  of  the  p  —  1  dimensional  hyperplane  containing  F. 


The  first  step  is  to  find  a  set  of  simplices  {Sj}  such  that  (1)  their  combined  vertex  set  is 
{xi, . . . ,  im},  (2)  the  simplices  only  intersect  at  common  faces  (3)  their  union  gives  the  convex  hull 
of  the  vertex  set.  This  can  be  done  inductively  as  follows:  for  m  =  p  +  1,  the  set  simply  consists  of 
one  simplex  which  is  the  convex  hull  of  the  p  +  1  points.  Suppose  now  we  obtain  a  set  of  simplices 
Si,  S2,  •  •  • ,  Sj  to  cover  m  >  p  points,  and  consider  one  additional  point  xm+i-  If  £m+i  €  Sfr  for 
some  k,  then  we  can  simply  replace  Sk  with  the  p  +  1  simplices  formed  by  xm+i  with  each  of  the 
faces  of  Sk.  It  is  easy  to  see  that  these  p  +  1  simplices  only  intersect  at  common  faces  and  their 
union  is  Sk,  so  that  the  updated  set  of  simplices  now  covers  the  m  + 1  points.  On  the  other  hand,  if 
xm+i  lies  outside  P  =  U fslSu  then  for  each  facet  F  of  some  Sk  which  separates  zm+i  from  P.  we 
add  a  simplex  formed  by  xm+1  with  F  to  the  set.  It  can  also  be  proved  that  these  added  simplices 
together  with  the  original  ones  satisfy  the  three  conditions. 

Given  these  simplices.  we  can  now  define  our  interpolating  linear  spline  /  as  follows.  First 
define  /(x ,)  =  y;  at  the  given  data  points.  For  other  x  6  [— l,l]p,  if  z  G  Sj  for  some  j ,  let 
/(*)  -  12i  <*i/(ui)>  where  u^’s  are  the  vertices  of  Sj  and  x  =  It  is  easy  to  check  that 

because  of  the  above  three  conditions  on  the  simplices,  f  is  well-defined  and  continuous.  To  extend 
/  continuously  outside  P  =  U jSj,  define  f{x)  to  be  equal  to  the  value  of  /  at  the  nearest  point  in 
P  to  x.  Since  P  is  convex,  this  nearest  point  is  unique,  and  this  guarantees  the  continuity  of  this 
extension. 

If  we  view  this  interpolating  process  as  an  operator  Tm  mapping  the  data  vector  y  = 
{yi,  i/2)  •  •  • » y*n)  to  the  piecewise  linear  interpolating  function  (Tm(y))(x),  then  we  can  see  that 
this  operator  is  linear  and  its  gain  defined  as: 


||Tm||  =  sup  ||rm(y)||oo 

l|y||oo=i 


(1) 


is  equal  to  one.  This  simple  fact  ensures  that  no  matter  how  many  data  are  obtained,  noise  in  the 
data  will  not  be  simplified  in  the  interpolating  process.  This  property  of  linear  splines,  which  is  not 
shared  by  methods  such  as  global  polynomial  interpolation,  turns  out  to  be  the  key  to  guarantee 
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the  consistency  of  the  estimates.  A  similar  situation  is  encountered  in  linear  system  identification 
from  frequency  response  data  ,8],  where  1  dimensional  splines  are  used  instead  of  polynomials  to 
interpolate  the  noisy  data  to  guarantee  robustness  of  the  identification  procedure. 

With  the  above  basic  discussions  on  multivariate  linear  splines,  we  may  now  state  the  main 
result  of  this  section. 

Theorem  6.1  Let  the  model  set  M  =  X,  the  set  of  all  fading  memory  systems.  If  the  input  v  is 
persistently  exciting,  then  there  is  an  algorithm  which  can  achieve  an  optimal  worst-case  asymptotic 
error  u,  £)  =  6.  This  algorithm  does  not  require  the  knowledge  of  S  in  computing  the  estimates. 

Proof  The  structure  of  the  algorithm  is  as  follows.  We  view  the  model  set  M  as  the  closure  of  the 

finite-memory  systems  Mp ,  p  =  0.1,2 .  We  start  by  assuming  that  the  true  system  is  in  M0 . 

Data  is  observed  until  time  n(0),  after  which  the  algorithm  comes  up  with  an  estimate  €  Mo- 
Then  it  moves  onto  the  next  model  set  Mr,  and  waits  until  time  n(l)  before  coming  up  with  an 
estimate  h(1)  G  M\.  The  algorithm  continues  to  move  onto  the  model  set  of  one  higher  order,  to 
produce  a  new  estimate.  It  will  be  shown  below  how  the  time  n(p)  is  specified  and  the  estimate 

fi(p)  is  computed  for  each  p. 

Let  h  be  the  true  system.  Let  {£p}  be  any  sequence  which  monotonicallv  goes  to  zero. 

Fix  p,  and  let  the  time  n  6  [n(p  -  1),  n(p)j.  (This  is  when  the  algorithm  is  collecting  data  to 
compute  an  estimate  in  Mp.)  Consider  all  the  blocks 

K-p+1, ...,un_i,un),  Vn  =  n(p—  l),...,n(p) 

in  the  input  as  data  points  in  the  cube  [~l,l]p.  We  maintain  a  simplex  structure  in  [-1,1]”  with 
these  data  points  as  vertex  set,  and  the  structure  is  incrementally  modified  more  or  less  according 
to  the  procedure  discussed  earlier,  with  a  slight  twist.  Let  Cn  =  U jSj  be  the  union  of  the  simplices 
at  time  n  and  dn  be  the  distance  between  Cn  and  the  corner  of  (-1,  l]p  farthest  away  from  (  „■ 
At  time  n  +  1,  one  more  data  point  is  obtained.  If  dn  <  6P  and  the  new  data  point  lies  outside  C«, 
then  discard  the  new  point.  Otherwise  update  the  simplex  structure  as  described  earlier. 

Let  n(p)  be  the  earliest  time  such  that  dn(p)  <  6P  and  the  diameter  of  the  largest  simplex  in 

C  is  less  than  Sp.  At  this  time,  the  algorithm  returns  an  estimate  Mp)  =  <pn(p){h{u)  +  d)  to  be  the 
p-th  order  system  with  memory  function  as  the  piecewise  linear  spline  interpolant  of  the  current 

simplex  structure. 

We  now  that  n(p)  <  oo  for  every  p.  First  we  see  that  because  the  input  is  persistently 

exciting,  the  p-blocks  in  u  are  dense  in  [-1,  l]p  (Otherwise,  there  is  a  ball  in  [-1,  l]p  which  oes 
not  contain  any  blocks  in  u,  and  we  can  construct  a  p- step  finite  memory  system  with  a  continuous 
memory  function  /  :  9ftp  —  3?  to  be  positive  at  the  centre  of  the  ball  and  zero  outside.  Ihen 
applying  the  input  u  to  the  system  will  give  a  zero  output  while  the  gam  of  the  system  is  non¬ 
zero  thus  contradicting  the  persistent  excitedness  of  u.)  Hence  there  exists  a  time  m(p)  such  that 
d  \  <  8  .  After  this  time,  the  convex  hull  Cn  no  longer  expands.  All  the  changes  consist  of 
further  partitioning  of  the  simplices  inside  Cn  due  to  the  new  data  points.  Because  the  data  points 
are  dense,  it  can  seen  that  the  diameter  of  the  largest  simplex  must  go  to  zero.  Hence,  n(p)  is 

finite. 

We  now  claim  that: 


limsup  \\<f>n(p){Hu)  +  d)  -  h\\<  6 

p— *oo 
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for  all  d.  ||rf||oo  <  S.  Note  also  that  the  algorithm  defined  above  does  not  use  the  value  of  6  in 
computing  the  estimate. 

Take  any  e  >  0.  There  exists  some  q  such  that  Aiq  contains  a  system  he  with 


\\h  -  /ie||  <  t  '2) 

Let  p£  be  the  {q- step)  memory  function  for  h(.  For  p  >  q,  <pn(P){h(u)  +  d)  is  the  spline  interpolant 
that  approximates  the  unknown  memory  function,  and  y  =  h(u)  +  d  is  the  output.  We  can  a'so 
extend  ge  to  a  function  on  f-l,l]p  which  depends  only  on  the  last  q  coordinates.  Now. 

\\4>n(p){HU)  +  d)  -  ^ll°° 

=  \\4>n{p)(h{u))  +  <Pn(p){d)  -  g-\\oo  by  linearity  of  the  interpolation  operator 

<  Hn(p){Hu))  -  ge ||oo  +  l|^„(p)(rf)||oo 

<  Il<^rt(p)(^(«))  -  ff'llco  -r  S  by  eq.  1. 

The  first  term  is  the  interpolation  error  when  the  data  is  noiseless,  whereas  the  second  term  is 
the  error  due  to  the  noise.  We  now  show  that  the  first  term  can  be  made  arbitrarily  small  for  large 

P ■ 

Since  is  continuous,  g6  is  a  uniformly  continuous  function  on  [-1,  l]q.  Choose  such  that 

||zi  -  ^2 ! 1 2  <  ft  =>  ||3‘(*i)  -  g‘{x2)h  ^  €- 

Now  pick  p  sufficiently  large  such  that  6p  <  ei  and  p  >  q.  Let  gp{x)  =  4>n(p)(h{u))- 

Now  for  any  x  €  Cn{p),  the  convex  hull,  let  x  =  £<<*<*,•,  where  x,  are  the  vertices  of  the  simplex 
containing  x.  Since  gp  agreees  with  the  noiseless  output  data  at  the  vertices,  by  Eqn.  2.  for  each  i , 

|  g>{xi)-g*(zi)\<c.  <4) 

We  have: 

\9P(X )  ~  </*(*)! 

i 

<  |  ^Oig^Xi) -ge(x)|  +  €  by  Eqn.  4 

t 

t 

<  2e 

by  Eqn.  2,  since  ||xt-  -  x||  is  less  than  the  diameter  of  the  simplex,  which  is  smaller  than  ej. 

Now  for  x  outside  (7n(p)>  let  x'  be  the  point  in  Cn^p)  which  is  closest  to  x.  By  definition  of  n(  p), 
the  distance  of  x'  from  x  is  at  most  Sp  <  €!.  Hence: 


\gp(x)  -  ffe(z)| 

=  |yp(x')  -  3e(z)l  by  definition  of  the  interpolant 

<  | gp{x')  -  g£(z')l  +  6  by  Eqn.  2 

<  3e  from  above. 
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Therefore,  if  hp  is  the  finite-memory  system  with  memory  function  <pn{p){h{u)  +  d),  then 

||  hp  -  h|| 

<  ll<7p  ~  9e\\oo  +  e 

<  8  +  4e . 

Since  this  is  true  for  all  e,  it  follows  that 

limsup  || hp  -  h\\  <  8 

p— ♦  OO 

PI 

as  desired. 

We  would  like  to  make  .  comment  about  the  time  complexity  of  this  identification  problem.  It 

can  be  Icily  seen  that  in  general,  the  time  needed  to  identify  a  system  to  a  preserved  accuracy 

expon^Sly  as  .header  of  ,h.  system,  even  when  there  is  no  noise.  For  example  .  we 

assume  a  certain  Lipschitz  condition  on  the  order  p  memory  function  g ,  such  as  ls(*)  pistil 

II  thpn  to  identify  the  function  up  to  accuracy  e  (in  the  ||  •  ||oo  norm),  the  number  o  a  a 
M\\x  -  y||,  then  to  identity  tne  luncuon  P  -  v  Since  the  volume  of  an 

noints  needed  is  at  least  the  minimum  number  ot  e-bails  to  cover  i  r,  x  ,p 

e  ball  is  proportional  to  ep,  it  is  clear  that  this  minimum  number  is  at  least  proportion^  o  (7)  , 
and  hence  so  is  the  experiment  length.  This  means  that  if  p  is  large,  the  experiment  length  will  be 
very  long  if  we  make  no  further  assumption  on  the  unknown  plant. 

It  is  interesting  to  compare  this  situation  with  the  problem  of  identifying  linear  finite  impuhe 
rJonse  sterns  For  nonlinear  systems  the  time  complexity  is  exponential  of  the  order  whether 
or  not  there  is  noise.  For  the  linear  case,  while  it  takes  only  linear  time  to  identify  a  FIR  system 
ZZL  when  there  is  no  noise,  it  has  been  shown  {3,  15]  that  the  time  compkxity  munedi^ely 
becomes  exponential  once  we  introduce  any  worst-case  noise.  Moreover,  it  has  been  demon 
that  if  we  are  willing  to  put  a  probability  distribution  on  the  noise,  polynomial  time  complexi  y 
.  obtained  [191  These  facts  show  that  while  in  the  nonlinear  case,  the  plant  uncertain  y 
£££  "iS  of  the  ideu.mcu.fou,  iu  .he  lines,  case,  .he  comp,exi.y  is  sens...ve 

to  how  the  noise  is  modelled. 

7  Conclusions 

A  framework  for  the  analysis  of  asymptotic  worst-case  identification  of  LTI  systems  h^  ^ 
^tended  to  the  setting  of  nonlinear  fading  memory  systems.  For  model  sets  hat  are  eithe 
comnact  or  separable,  and  for  any  experiment,  the  optimal  worst-case  error  is  always  bounded  by 
twice  the  lowePr  bound,  which  is  the  diameter  of  a  certain  uncertainty  set.  Optimal  inputs  w  c 
a^nTe  gamete;  are  characterized.  It  is  also  shown  that  accurate  asymptotic  identification 
"achieved  by  an  optimal  input,  using  an  untuned  algorithm  based  on  spline  interpolation. 
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Fig.  1.  Hybrid  discrete /continuous-time  system. 


inter-sample  dynamics  of  the  hybrid  system,  and  that 
the  inter-sample  dynamics  are  governed  only  by  the  plant 
and  not  the  controller  dynamics.  We  use  the  latter  fact 
to  derive  explicit  bounds  on  the  approximation  [main 
inequality  (5)]  which  can  be  computed  a  priori  and  depend 
only  on  the  plant.  We  also  show  that  the  rate  of  conver¬ 
gence  of  the  approximation  is  (1/n). 

As  already  mentioned,  sampled-data  systems  are  peri¬ 
odic,  the  main  theoretical  tool  we  use  for  dealing  with 
periodic  svstems  is  a  lifting  technique  for  continuous-time 
systems  developed  in  [1],  [2].1  The  technique  establishes  a 
strons  correspondence  between  periodic  systems  and  time 
invariant  infinite-dimensional  systems.  In  the  next  section, 
we  briefly  describe  the  lifting  and  it’s  application  to  the 
sampled-data  problem.  We  then  set  up  an  equivalent 
infinite-dimensional  problem  whose  solution  is  obtained 
using  an  approximation  procedure.  Formulas  for  the 
(almost)  equivalent  discrete-time  problem  are  given  in 
Section  in.  In  the  later  sections,  the  issue  of  the  conver¬ 
gence  of  the  approximation  procedure  is  investigated,  this 
is  done  by  decomposing  the  equivalent  infinite-dimen¬ 
sional  problem  and  analyzing  the  decomposition.  In  the 
last  section,  a  geometric  interpretation  is  given  for  the 
reduction  of  the  infinite-dimensional  problem,  and  it  is 
compared  with  the  sampled-data  problem  from  [1]. 
We  also  discuss  possible  reasons  behind  the  fact  that  in 
the  ll  sampled-data  problem  (in  contrast  to  the  sam¬ 
pled-data  problem),  the  solutions  are  given  by  approxima¬ 
tion,  rather  than  exact  procedures. 

Finally,  we  note  that  although  the  closed  loop,  sampled- 
data  system  is  periodically  time  varying,  and  thus  one 
cannot  refer  to  the  l1  norm  of  its  impulse  response,  it  is 
shown  in  [3]  that  the  /.“-induced  norm  of  a  periodic 
system  can  be  interpreted  as  a  type  of  an  l1  norm  of  the 
operator-valued  “impulse  response”  of  the  lifted  system. 
This  justifies  calling  this  problem  the  l1  sampled-data 
problem. 

II.  The  Lifting  Technique  in  Sampled-Data 
Systems 

In  this  section,  we  briefly  summarize  the  lifting  tech¬ 
nique  for  continuous-time  periodic  systems  developed 
in  [1],  [2],  and  apply  it  to  the  sampled-data  problem. 
The  idea  of  the  lifting  technique  is  to  put  a  periodic 

'Essentially  the  same  technique  was  arrived  at  independently  in  [22] 
and  [23]. 


continuous-time  system  in  a  strong  correspondence  with  a 
shift  invariant  (i.e.,  discrete-time  time-invariant)  system, 
which  amounts  to  rearranging  the  original  system  so  that 
its  periodicity  can  be  viewed  as  shift  invariance.  To 
accomplish  this,  we  first  define  the  lifting  for  signals,  for 
which  the  appropriate  signal  spaces  need  to  be  established. 

For  continuous-time  signals,  we  consider  the  usual 
L10,  =»)  space  of  essentially  bounded  functions  [8],  and  it’s 
extended  version  L"[0,  “).  We  will  also  need  to  consider 
discrete-time  signals  that  take  values  in  a  function  space, 
for  this,  we  define  lx  to  be  the  space  of  all  X-valued 
sequences,  where  X  is  some  Banach  space.  We  define  Vx 
as  the  subspace  of  lx  with  bounded  norm  sequences,  i.e., 
where  for  {/,}  e  lx,  the  norm  \Wh  :=  sup,-  II/-II*  <  °°- 
Given  any  /  e  L"[0,  “),  we  define  its  lifting  f  e  lL1p<rV  as 
follows:  /  is  an  I“[0,  T]-valued  sequence,  we  denote  it  by 
iff),  and  for  each  i 

f](t)  ■■=  fit  +  ri)  0<f<r. 

The  lifting  can  be  visualized  as  taking  a  continuous-time 
signal  and  breaking  it  up  into  a  sequence  of  “pieces”  each 
corresponding  to  the  function  over  an  interval  of  length  r 
(see  Fig.  2).  Let  us  denote  this  lifting  by  WT:  L“[0,°°)  -* 
l  o  W.  is  a  linear  isomorphism,  furthermore,  if 
restricted  to  £”(0,°°),  then  WT:  L*[0,  -*  ^  311 

isometry,  i.e.,  it  preserves  norms. 

Using  the  lifting  of  signals,  one  can  define  a  lifting  on 
systems.  Let  G  be  a  linear  continuous-time  system  on 
L“[0,°°),  then  its  lifting  G  is  the  discrete-time  system 
G  ■=  WTGWf\  this  is  illustrated  in  the  commutative 
diagram  below: 


1L1 0.r]  *lL1 0.r) 

wr*J  I* 

L“[0  ,  «)  — —*  L*[0  ,  *) 


Thus,  G  is  a  system  that  operates  on  Banach  space 
(L°°[0,  t])  valued  signals,  we  will  call  such  systems  infinite 
dimensional.  Note  that  since  WT  is  an  isometry,  if  G  is 
stable,  i.e.,  a  bounded  linear  map  on  L“  then  G  is  also 
stable,  and  furthermore,  their  respective  induced  norms 
are  equal,  IIGII  =  HG||.  The  correspondence  between 
a  system  and  its  lifting  also  preserves  algebraic  system 
properties  such  as  addition,  cascade  decomposition  and 
feedback  (see  [1]  for  details). 

The  usefulness  of  the  lifting  in  the  sampled-data  protv 
lem  is  the  fact  that  if  G  is  a  r-periodic  system,  then  G 
commutes  with  the  shift  on  Icys.iy  ^ha*  is,  ^  *s  sj1® 
invariant.  This  basic  fact  allows  us  to  treat  continuous-time 
periodic  systems  as  discrete-time  time-invariant  systems, 
albeit  infinite-dimensional  systems. 

State  space  models  can  be  found  for  the  lifted  systems. 
To  illustrate,  let  G  be  a  continuous-time  time-invariant 

system  given  by  a  state  space  realization  G  =  •  1“ 


[1]  it  was  shown  that  the  lifting  G  has  a  state  space 
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Fig.  2.  Hi:  Lf[0, *)  ->  Fig.  3.  Equivalent  problem. 


realization  given  by: 


B:  LT[  0,t]-»R"« 

A:  -»  Rn*  ^ 

C:  R"1  -*  £"[0,7]  !  ' 

D  :  L“[0,  t]  -*  L“[0,  r] 

where  the  operators  C,  B,  D  are  given  in  terms  of  their 
kernel  functions,  and  1(.,  is  the  unit  step  function. 

Notation :  It  simplifies  the  notation  greatly  to  use  the 
same  symbol  for  an  operator  and  its  kernel,  for  example, 
DU,  s )  [or  5(s)]  refer  to  the  kernel  functions  representing 
the  operator  D  (or  B).  For  operators  that  map  a  function 
space  to  R",  such  as  B  above,  we  generally  use  s  (or  s)  to 
denote  the  variable  of  the  kernel  function,  and  for  opera¬ 
tors  that  map  R"  to  a  function  space  such  as  C  above,  we 
use  the  variable  r  (or  f).  The  kernel  representation  for  the 
operators  B,  C,  D  means  that  their  action  is  given  by 

Bu  =  f  B(s)u(s)  ds  (Cx)(t)  =  C(t)x,  r  e  [0,r] 

Jo 

(Du)(t)  =  f  D(t,  s)u(s)  ds. 

Jo 

Note  that  the  state  space  of  G  is  finite  dimensional  (the 
nx  in  Rn«  refers  to  the  dimension  of  the  state  space  of  G), 
while  its  input  and  output  spaces  are  infinite  dimensional. 
This  fact  is  significant  in  that,  although  lifted  systems  have 
infinite-dimensional  input  and  output  spaces,  they  can  be 
realized  with  a  state  space  of  dimension  no  larger  than 
the  dimension  of  the  original  continuous-time  state  space 
model. 

To  apply  the  lifting  to  the  sampled-data  problem,  con¬ 
sider  again  the  standard  problem  of  Fig.  1,  and  denote  the 
closed-loop  operator  by  y(G,/^C^).  Since  the  lifting 
is  an  isometry,  we  have  that  ||^(G,^GS^)||  =  II 
(G,J%CJ*T)W~l IS,  this  is  shown  in  Fig.  3(a).  In  Fig.  3(b), 
we  lump  the  lifting  operators  WT  and  IT"1  and  the 
sample  and  hold  operators  and  consider  a  new  gener¬ 
alized  plant  G.  G  is  a  discrete-time  system  with  one 
infinite-dimensional  input  and  output  (corresponding  to 
w  and  z)  and  one  finite-dimensional  input  and  out¬ 


put  (corresponding  to  u  and  y).  Thus,  y(G,  C)  = 
W^G^CtyW:1 ,  which  means  that  the  closed-loop 
operator  9{G,C )  is  in  fact  the  lifting  of  the  closed- 
loop  operator  9(G,%rCS'r).  Since  the  lifting  WT  is  an 
isometry,  we  have  then  characterized  the  Lx-induced  norm 
of  the  hybrid  system  as  the  induced  norm  of  the 
time-invariant  system  &(G,C).  The  conclusion  is  that  the 
problem  of  minimizing  the  IT  induced  norm  of  the 
sampled-data  system,  is  equivalent  to  that  of  minimizing 
the  induced  norm  of  the  infinite  dimensional  but  time-in¬ 
variant  system  9iG,  C ).  The  previous  discussion  together 
with  the  characterization  of  internal  stability  for  hybrid 
systems  in  [12]  (conditions  for  nonpathological  sampling) 
yields  the  following  theorem. 

Theorem  1:  Let  G  and  G  be  as  in  Fig.  3,  then  for  any 
finite  dimensional  C. 

i)  y(G,^fCJ^)  is  internally  stable  if  and  only  if 
9{G,  C )  is. 

ii)  mG^c^yh  =  mg,c)\\. 

This  reformulation  of  the  sampled-data  problem  to  the 
problem  with  G  has  several  advantages,  first,  the  con¬ 
troller  has  no  “structural  constraints”  on  it,  in  contrast  to 
the  previous  formulation  where  the  controller  is  con¬ 
strained  to  be  a  sampled-data  controller,  i.e.,  of  the  form 
JZCSi,  second,  both  the  controller  C  and  the  generalized 
plant  G  are  shift  invariant,  thus,  the  periodicity  of  the 
original  system  is  “removed,”  and  third,  all  parts  of 
the  system  are  operating  over  the  same  time  set  (discrete 
time).  The  price  paid  for  these  advantages  is  the  infinite 
dimensionality  of  the  input  and  output  spaces.  In  this 
paper,  we  will  show  how  one  can  reduce  the  problem  to  a 
finite-dimensional  one  by  “approximating”  the  input  and 
output  spaces  by  finite-dimensional  spaces,  thus,  reducing 
the  problem  to  a  standard  finite-dimensional  l1  problem. 

We  now  present  (from  [1])  a  state  space  realization  for 
the  new  generalized  plant  G  which  will  be  useful  in  study¬ 
ing  the  problem  further.  Let  the  original  continuous-time 
plant  G  be  given  by  the  following  realization 


A 

V 

1 

G  = 

Ci 

.C> 

Dn 

0 

D\2 

0 

• 

It  is  assumed  that  the  sampler  is  preceded  with  a  presam¬ 
pling  filter  which  is  a  strictly  causal  linear  system,  this  is  a 
realistic  assumption  since  an  ideal  sampler  is  not  a  physi¬ 
cal  device,  a  real  sampler  can  be  modeled  as  an  integrator 
with  a  fast  time  constant  followed  by  an  ideal  sampler. 


TW 
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The  system  shown  above  represents  a  generalized  plant 
with  the  presampling  filter  absorbed  in  it,  the  fact  that 
D2 1  =  Dzz  =  0  is  due  to  the  strict  causality  of  the  presam- 
piing  filter,  this  also  guarantees  that  the  ideal  sampler 
only  operates  on  continuous  signals.  It  can  be  shown  ([1]) 
that  a  realization  for  the  generalized  plant  G  (Fig.  3)  is 
given  by 


We  also  note  that  because  of  Theorem  1,  suboptimal 
solutions  to  the  above  problem  will  also  be  suboptimal 
(with  the  same  norm)  for  the  hybrid  system. 

The  above  infinite-dimensional  problem  is  solved  by  an 
approximation  procedure  through  solving  a  standard 
MIMO  l1  problem.  The  idea  we  use  is  similar  to  that  in 


n 

G  „ 

A 

A 

A 

b2~ 

’  e"7 

eMT-*% 

^(t)B2 

G  = 

'Jrn 

^12 

= 

c, 

Du 

D\2 

= 

CxeA‘ 

CxeA(,~s)\U  -  s)Bx  +DnS(t-s ) 

C^(t)B2  +  Du 

Cr2l 

^22 

C2 

0 

0 

c2 

0 

0 

where  'P(z)  :=  /0'  eAs  ds.  The  system  G  has  the  following 
input  and  output  spaces 

:  ^qo.r) 

<J12:  lL^ o,T] 

^21 :  Ifl 0,  t]  Irmt 

G%2  •  Ir"u  (jtv- 

The  main  theme  of  this  paper  is  to  approximate  the 
infinite-dimensional  input  and  output  spaces  L”[0,  t]  by 
finite-dimensional  spaces.  Bounds  on  the  approximation 
of  the  closed-loop  system  (i.e.,  with  controller)  will  be 
obtained  that  are  characterized  only  in  terms  of  the 
operators  BvCl,Di2,Dll,  which  in  turn  are  charac¬ 
terized  by  the  original  continuous-time  plant  and 
independent  of  the  controller. 

The  interpretation  that  can  be  given  to  the  operators 
Dn  is  that  they  characterize  the  inter-sample 
behavior  of  the  overall  system.  In  the  lifted  formulation 
of  the  sampled-data  problem,  the  state  of  the  system 
is  the  state  of  the  plant  G  and  the  state  of  the  con¬ 
troller  C,  both  of  which  evolve  in  discrete  time.  The 
controller  thus  has  an  effect  on  the  state  of  the  system 
only  at  the  sampling  instants,  and  the  inter-sample 
behavior  is  governed  only  by  the  plant  dynamics.  This  fact 
is  made  intuitive  by  the  observation  that  in  between  the 
samples,  the  system  is  essentially  operating  in  open  loop 
since  there  is  no  feedback  (u  is  constant  in  between 
samples). 

The  lifting  of  the  sampled-data  problem  makes  clear 
that  the  inter-sample  dynamics  are  characterized  by  the 
operators  Bx,CvDl2,Du,  and  thus  the  issue  of  approxi¬ 
mating  these  dynamics  essentially  amounts  to  approximat¬ 
ing  the  operators,  which  are  independent  of  the  controller. 
The  foregoing  ideas  are  pursued  in  the  next  sections. 


IE.  Solution  Procedure 


Using  the  lifting  we  are  able  to  convert  the  problem  of 
finding  a  controller  to  minimize  the  If  induced  norm  of 
the  hybrid  system  (Fig.  1)  into  the  following  standard 
problem  with  an  infinite-dimensional  generalized  plant  G: 


Topt 


inf..  WXG^CSVW 

C  stabilizing 


[10]  and  [14]  where  multirate  sampling  is  used  to  obtain 
discrete-time  systems  that  approximate  the  continuous¬ 
time  behavior  of  hybrid  systems.  This  approximation  pro¬ 
cedure  was  used  in  [10]  to  address  the  /'  sampled-data 
problem.  The  approximation  procedure  we  use  is  essen¬ 
tially  equivalent  to  that  in  [10],  however,  since  we  intro¬ 
duce  it  directly  as  an  approximation  to  the  lifted  problem 
(2),  the  nature  of  the  approximation  is  more  transparent 
and  we  are  able  to  explicitly  isolate  the  parts  of  the  system 
that  need  to  be  approximated  independently  of  the  con¬ 
troller.  The  consequence  is  that  we  are  able  to  obtain 
explicit  bounds  on  the  degree  of  approximation  in  terms 
of  constants  that  can  be  computed  a  priori,  and  that  are 
dependent  only  on  the  plant. 

We  now  describe  the  approximation  procedure.  Let 
and  S*n  be  the  following  operators  defined  between 
L"[0,t]  and  rq{n&F°q{n)  is  Rnx?  with  the  maximum  norm 

S'*:  L~[0,t} ->  tyn)  (<7nu)(i)  =  u(w); 


«er,[0,r] 


K-  rq(n)-L%[  0,t] 


C^«)(f)  =  u 


tn 

T 


{«(*)}  6  rq(n) 

(strictly  speaking,  is  not  an  operator  on  Lfq  but  on  the 
subspace  of  left  and  right  continuous  functions,  this  dis¬ 
tinction  is  irrelevant  here  since  in  our  setting,  assumptions 
are  made  to  guarantee  that  S*n  operates  only  on  continu¬ 
ous  signals),  the  above  operators  can  be  thought  of  as 
“fast”  sample  and  hold  operators  (see  Fig.  5).  For  simplic¬ 
ity  of  notation  we  will  suppress  the  dimension  q  in  the 
sequel. 

Now  to  approximate  the  infinite-dimensional  problem, 
we  use  the  approximate  closed-loop  system  ^^(G,C  )^n 
(see  Fig.  4),  and  for  each  n  we  define 

%-  inf  \\y^{G,CK\\.  (3) 

C  stabilizing 


=  inf  \\9iC,C)\\. 

C  stabilizing 


(2) 


This  new  problem  now  involves  the  induced  norm  over 


Fig.  4.  The  system  Gn. 


Fig.  5.  The  operators  and^. 


where  {•}„  means  the  first  n  X  n  blocks  of  the  impulse 
response  matrix  of  the  discrete-time  system  given  by  the 
realization  in  {•}• 

The  solution  to  the  original  infinite-dimensional  prob¬ 
lem  (and  thus  to  the  sampled-data  problem)  is  as  follows: 
n  can  be  chosen  large  enough  such  that  if  the  designed 
controller  Cn  is  almost  optimal  for  the  approximate  prob¬ 
lem  (3),  then  it  is  almost  optimal  for  the  original  problem 
(2).  In  essence,  this  approximation  scheme  “converges,” 
i.e.,  one  can  obtain  almost  optimal  controllers  by  choosing 
n  large  enough  and  solving  a  MIMO  /'  problem.  Exactly 
what  convergence  means  here  is  described  next. 

IV.  Design  Bounds 

In  this  section  we  investigate  the  nature  of  the  approxi¬ 
mation  of  WG,C)W  by  WGn,C)\\-  In  order  to  show  that 
the  synthesis  procedure  outlined  in  the  previous  section 
yields  controllers  with  performance  arbitrarily  close  to  the 
optimal,  one  needs  to  obtain  explicit  bounds  on  the  degree 
of  approximation  of  IlS^G.OII  by  \\?{Gn,C)\\. 

Let  us  begin  with  analysis.  Note  that  since  IIKG,C)II  is 
an  infinite-dimensional  system,  its  /£>[o,TfintIucecI  norm  is 
not  readily  computable.  A  method  of  computing  I \9{G,  C)ll 
comes  from  the  limit 


rr(n),  i.e.,  it  is  a  standard  MIMO  ll  problem. 

Let  us  denote  the  generalized  plant  associated  with 
^y(G,C)^  by  G„,  that  is,  G„  is  such  that  (see  Fig.  4) 

S^G,CXT„  =  ?{Gn,C). 


A  realization  for  G„  is  given  by, 


The  new  operators,  which  are  now  matrices,  are  computed 
to  be 


||y(G,Oj|=  lim  ||^y(G, C)^:||  =:  lun||y(Gn,C)|| 

(4) 

for  a  fixed  C.  This  formula  can  be  proved  using  arguments 
about  the  approximation  of  continuous  functions  by  sim¬ 
ple  functions  in  L"  ([19D,  and  also  follows  immediately 
from  the  main  inequality  below.  Since  ^(G„,C)  is  a 
time-invariant  MIMO  system  and  ll?XGB,C)ll  is  its  l 
norm,  it  can  be  computed  to  any  desired  accuracy,  conse¬ 
quently,  by  (4)  the  actual  norm,  IL 9iG,  C)||  can  be  com¬ 
puted  to  any  desired  accuracy.  However,  (4)  is  by  far  not 
sufficient  to  show  the  convergence  of  the  synthesis  proce¬ 
dure,  since  given  only  (4),  the  rate  of  convergence  may 
depend  on  the  choice  of  C. 

Our  objective  is  to  obtain  explicit  bounds  on  ll^vG.OII 
that  do  not  depend  on  the  controller  in  the  following  fonn 

Main  Inequality:  There  are  constants  K0  and  Kx  which 
depend  only  on  G,  such  that  for  n  >  2 nx,  and  r/n  non- 
pathological 

||^(G„C)||  £  ||*G,C)|| 

s£i  +  (l  +  £i)||^G,.C)||.  (5) 

Remarks: 

a)  The  significance  of  the  bound  (5)  is  that  it  is  exactly 
what  is  needed  for  synthesis.  When  one  performs  an  l 1 
design  on  the  approximate  discretization  G„,  the  result  is 
a  controller  that  keeps  WiGn,  C)||  small,  but  the  objective 
is  to  keep  the  Lx-induced  norm  of  the  hybrid  system  (or 
equivalently  IL?(G,C)ID  small,  and  the  inequality  (5)  guar¬ 
antees  this.  It  is  thus  essential  that  we  bound  the  hybrid 
norm  from  above  by  a  function  of  \[&(Gn,  C)||. 
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b)  The  above  inequality  shows  that  the  approximation 
converges  at  a  rate  of  (1  /ri). 

The  first  inequality  in  (5)  is  easy  to  obtain,  first  note 
that 

\\9{Gn,C)\\<\\^G,C)\\  Vn, 

since 

\\9{GnX)\\~\\S^GX)Mrn\\ 

<ll^,ll||yu?.c)|||^i!<||y(G,c)|| 

because  \Wn\\  <  1  on  F(ri)  and  ||^!i  <  1  on  the  subspace 
of  L”  for  which  it  is  defined. 

One  way  to  utilize  the  main  inequality  for  getting 
a  priori  guarantees  on  the  hybrid  norm  in  terms  of  the 
discrete-time  / 1  problem  is  guided  by  the  following;  for  a 
fixed  n,  if  one  performs  a  MIMO  /'  design  (as  in  [9],  [17D 
on  G„  and  obtains  a  yn  +  e  optimal  controller  (given  by 
C„ ),  i.e.,  \\9{Gn,Cn)\\  <  yn  +  «.  then  inequality  (5)  pro¬ 
vides  that  if  C„  is  implemented  in  the  hybrid  system,  then 

v  S  M<5.g.)||  4*(i- ^)lHG..c)|| 

£  T  *  (* +  +  e) 

!^+(lt  tK  +  f)  <6) 

where  the  last  inequality  follows  from  yn  <  yopt,  which  is  a 
consequence  of  the  first  inequality  in  (5). 

The  above  inequality  can  be  simplified  by  using  an 
upper  bound  on  yop[,  such  a  bound  can  be  obtained  by 
finding  any  stabilizing  controller  C0  and_  computing  an 
upper  bound  on  the  hybrid  norm  of  9iG,C0)  (by  using 
the  main  inequality  with  a  large  n).  Call  that  upper  bound 
M.  Then  by  using  yopt  <  M,  inequality  (6)  can  be  rewritten 
as 

Kt  +K0(M+  e) 

Top.  *  HG’C,,)II  ^  - - -  +  6  +  Topf 

Thus,  in  order  that  C„  guarantees  \\y(G,^C„J^)H  < 
y  +  8  for  any  5  >  0,  we  choose  e  and  n  a  priori  to 
satisfy 

Kx  +  K0(M  +  e) 

8  < - +  e. 

n 

It  is  worthwhile  noting  that  the  problem  of  minimizing 
\\&{G„,C)\\  is  immediately  a  standard  Z1  problem  with 
time-invariant  plant  Also,  we  note  that  even  though  the 
approximation  problem  is  essentially  equivalent  to  a  mul¬ 
tirate  sampled-data  problem,  it  reflects  no  structural  con¬ 
straints  on  the  controller.  General  multirate  sampled 
problems  do  not  share  this  property  (see  [7]). 

The  next  section  is  devoted  to  the  derivation  of  the 
main  inequality  (5).  Several  interesting  issues  come  up, 
and  we  get  bounds  on  the  approximation  by  characterizing 


the  approximation  of  the  infinite-dimensional  parts  of  G, 
namely  the  operators  D12,  Dn. 


V.  Decomposition  and  Approximation  of  G 


It  will  be  very  helpful  in  the  derivation  of  (5)  to  intro¬ 
duce  a  decomposition  of  the  infinite-dimensional  system 
G  by  “extracting”  the  infinite-dimensional  parts  of  the 
system.  The  basic  idea  is  roughly  that  the  behavior  of  the 
hybrid  system  between  samples  is  _essentially  governed  by 
the  infinite-dimensional  parts  of  G,  namely  the  operators 
Bj,  C1;  Dn,  and  Dn.  These  operators  are  independent  of 
the  controller,  and  thus  it  should  be  possible  to  approxi¬ 
mate  the  behavior  in  between  the  samples  independently 
of  the  controller  by  “approximating”  the  aforementioned 
operators.  To  illustrate  this  point  further,  we  first 
decompose  G  as 


'  O  O 

° 

A 

B1  B2 

G  =  G0  + 

.  G0== 

Ci 

0  Du 

C2 

0  0 

and  we  note  that  G0  can  be  further  decomposed  as 


0 


0 

7. ' 
(7) 


This  decomposition  is  illustrated  in  Fig.  6.  The  closed-loop 
mapping  9{G,  C )  is  correspondingly  decomposed  as 

9iG,C)=Dn  +y{G0,C) 

=  Du  +  [Gi  Dn\9[G00,C)Bv  (8) 

We  will  use  the  notation  3  ~  [Cj  7512],  and  call  &  the 
output  operator  and  B,  the  input  operator. 

With  this  decomposition,  G00  is  finite  dimensional,  and 
3,  B{  are  finite  rank  operators 

3:  -*  Bx:  L°°[0,t]  ->  R"*. 

As  (8)  shows,  only  a  finite-dimensional  part  of  the  system 
[i.e.,  y(G(G00,C)]  is  dependent  on  the  controller,  while 
the  infinite-dimensional  parts  are  independent  of  C. 
Roughly  speaking,  the  controller  (being  discrete  time) 
only  effects  the  hybrid  system  at  the  sampling  instants, 
while  in  between  the  samples,  the  systems  evolution 
is  governed  by  the  operators  Dn,3,B i,  which  are  in 
turn  dependent  only  on  the  dynamics  of  the  original 
generalized  plant  G. 

The  remainder  of  this  section  and  the  appendixes  are 
devoted  to  deriving  the  main  inequality,  and  can  be 
skipped  without  loss  of  continuity. 

We  now  consider  the  issue  of  “approximating”  the 
infinite-dimensional  plant  G  by  a  finite-dimensional  plant 
G„.  First  we  note  that  the  two  norms  to  be  compared  are 


QrS')HQB&).  This  comparison  is  typically  easier  since 
Hand  H  are  both  continuous-time  systems  with  the  same 
input  and  output  spaces. 

Let  G„  be  the  generalized_plant  corresponding  to  the 
closed-loop  operator  Tn,  i.e.,  Tn  =  &{Gn,C).  G„  is  defined 
by 


Fig.  6.  Decomposition  of  G. 

of  9iG,C\  which  has  U\ 0,  rj  as  an  input-output  space, 
and  of  S*n9iG,C  )SPn,  which  has  Pin)  as  an  input-output 
space.  Therefore,  it  is  not  strictly  true  that 
approximates  &{G,C)  since  comparisons  like  WiG,  C )  - 
S*r?(G,CV%U  <£  do  not  make  sense.  We  will  replace 
y^PiG,  C)%?n  by  another  system  which  has  the  same  norm, 
but  truly  approximates  &{G,C). 

Define  the  following  operator  (the  normalized  integra¬ 
tion  operator)  ZTn\  LTO,  t]  -*  Pin)  by 

T  Jh/n 

The  following  properties  of  5^  can  be  easily  checked:  5^ 
is  a  linear  operator,  ll^ll  =  1,  and  ^  is  a  left  inverse 
to  i.e.  &&  =  identity.  If  ^  is  regarded  as  an  oper¬ 
ator  on  Ll[ 0,t],  i.e.,  l\n),  then  it  is 

easily  shown  that  ^  is  the  adjoint  of  (r/wi^,  that  is 
((r/nX^)*  Similarly,  if  ^  is  regarded  as  an  oper¬ 
ator  on  Pin),  Le.,^:  Pin)  -»  L'[0,  r],  then  M*  =  ij/nWn, 
which  also  implies  that  =^^- 

Let  us  denote  by  T  ■—  &iG,  C),  and  by  T„  ■=  &iGn,  C). 
As  already  mentioned,  T  and  Tr  cannot  be  compared 
directly  since  they  do  not  have  the  same  input  and  output 
space.  The  operator^  will  allow  us  to  form  a  system  Tn 
with  norm  equal  to  that  of  Tn,  but  with  the  same  input 
and  output  spaces  as  T.  _ 

Lemma  2:  Define  the  system  T„  ~  then 

lirji  =  iirji. 

Proof:  It  is  true  that  \\ynT^Tn\\  =  WS^T^W  since 
\\ynTPjrn\\  <  ll^i^ll  m  ^  W^niKW, 

and 

IldOTI  <  ^  WZ&&- 

Also,  since  SPn\  Pin)  -» L"t0,  r]  is  an  isometry,  we  con¬ 
clude  that 

n?j  ==  =  wym  ==  util- 


Remark:  The  above  lemma  is  of  general  interest  since  it 
provides  a  systematic  way  of  addressing  the  question  of 
how  a  discretized  system  ^./^“approximates”  the  origi¬ 
nal  system  H,  by  comparing  the  systems  H  and  H  ~ 


The  consequence  of  Lemmji  2  is  that  one  only  needs  to 
show  inequality  (5)  with  &iGn,C)  instead  of^tGn,  C).  As 
already  mentioned,  the  advantage  is  that  &iGn,C)  has  the 
same  input  and  output  spaces jas  C\  namely  U\ 0,  tJ. 

Next,  we  will  show  that  9iGn,  C)  actually  approximates 
9iG,  C),  and  this  will  yield  the  main  inequality  (5). 

Approximation  of  G:  The  approximation  of  G  will  be 
done  in  two  parts  corresponding  to  the  decomposition 
9iG,  C)  =  A i  +  9iG0,  C)  =  Dn  +  MG00,  C)Bv  It  will 
be  useful  in  this  section  to  use  a  short  hand  notation  for 
(see  Fig.  7) 


T0  ■=  <??{G00,C)B \ 
Ton  ■=  i^ToWO 


T00  ■=  ?{G00,C)  (9) 

Dn  ■■=  («)Dn(«) 

(10) 


A 

and  corresponding  to  the  decomposition  T  —  Djj  +  Ton, 
we  have 

Tn  =  («)(4  +  T0)OPX)  -A.  +  Ton. 

We  will  first  show  that  Ton  approximates  T0,  then  we  show 

___  A 

that  Dn  approximates  Dn. 

Proposition  3:  Let  n  £  2 nz,  such  that  r/n  is  not  a 
pathological  sampling  period,  there  exists  a  constant  K0 
which  depends  only  on  G,  such  that 

lir.  -  TJ  ±  ^-WTonl 

Remark :  It  is  important  that  the  above  bound  is  in 
terms  of  IITJI  which  corresponds  to  part  of  9iG„,C). 
The  reason  being  that  in  the  main  inequality,  we  must 
bound  the  norm  of  the  hybrid  system  from  above  by  the 
norm  of  the  discretized  system  5TGn,C).  In  fact,  it  is 
much  easier  to  produce  an  inequality  as  above  but  with 
JIT  ||  on  the  right-hand  side,  but  this  would  not  be  useful 
for  bounding  the  norm  of  the  hybrid  system. 

Proof:  The  proof  makes  use  of  the  decomposition 
of  T0  =  <?TpoBv  and  of  its  approximation  Ton  = 
iJP 5el)&T00Bxij?rt5rn)-  The  basic  idea  of  the  proof  (on  the 
output  side)  is  that  (J?ny„)  operates  on  functions  in 
& ■  .  c  L°°[0,  t],  and  functions  in  31^  are  continuous  and 
there  are  bounds  on  their  rate  of  change  (depending  on 
the  dynamics  of  the  plant),  so  on  31^  the  operator 
i^nyn)  approximates  the  identity,  and  it  also  has  a  left 
inverse  which  approximates  the  identity  as  n  -*  ®. 

We  now  approximate  from  the  output  side.  Lemma 
4  below  states  that  iJPn5fr)  has  a  left  inverse  on 
i.e.,  there  exists  i^nS^n)~L'  c  T“[0,  r] 
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Fig.  7.  Decomposition  of  the  approximate  system  9{Gn,C). 


such  that  )  L0?^)  =  identity  on  We  now 
establish 

%WT0  -  7;  II 

=  \p'n^r00Bl  -  ((«)-L(«)^r0051)  II 
-II (7-0^*^)  IK^^^oll 

where  the  operator  /  is  the  identity,  or  the  embedding  I : 

>,  -»  L"[0, r].  Also  from  Lemma  4,  we  have  that 
IK/  -”C^;^n)"L)U(^<f)ll  ^  (tf^/nX  this  implies 

K 

||C K<Z)Ta  -  r0||  < -j-\prnyn)T0\\.  (ID 

Now,  to  approximate  on  the  input  side,  we  need  to  take 
preadjoints  (see  Appendix  B): 

||^)70-(«)7’(J^;)|| 

=  II WnSrn)ST00Bx  -  &^iT0AWO II 

=  ||^n)^r00(i1-B1(^;))ll 

=  !(•£,  -  •%ttBiy((X’nS'n)ST00) II 

=  H(*ii  -  (*X)%)*{^n)0Too)  ||. 

From  Lemma  4  below,  0?^)  has  a  left  inverse 
when  restricted  to  &<-£,)>  i-e.,  is  such  that 

rn)~Li^„)  =  identity  on  3l(.gx)  c  L'fO,  r],  therefore 

||^n)r0  -  (XiSVT.&p;) || 

=  ||((^rL(«)*£, 
-(^)*A)*((«)^r0<,)  || 

=  ||(MK)  £  -  •^)ls?c?^;,£1)* 

•(^)^r00A(^))ll 

<  ||((«)-L  -  /jUr^H  ||(«)T0(^;)|| 

£,s  _ 

<  —  WTon\\  (12) 

n 

where  the  last  step  is  again  from  Lemma  4. 


Combining  inequalities  (11)  and  (12),  we  get 


nr,  -  fj\  =  || 7;  -  urHs')Te(x&) II 

=  ||r0  -  (%?nsffn)T0  +  («)r0 
-(«)70(^;)|| 
<\\T0-(WT0\\  +  \\(^n)T0 

-(^n)T0{%X)\\ 

<  ^||(«)T0||  +  ^llfjl 


but  (12)  also  implies  that  ISC^^rj  <  (1  +  ( Kg/n )) 
II7JI,  therefore 


HTX  -  TJ\  < 


£if1  +  £»L£t 

n  \  n  n 


\\TJ\  <  —\\?J\, 


where  K0  ■=  Kg  +  KgKg  +  Kg.  A  ■ 

Lemma  4  below  captures  the  idea  that  QZ'yJtf  approx¬ 
imates  S,  because^  the  sampling  operator  5*n  samples  only 
elements  in  £2(&),  and  since  there  is  a  bound  on  the 
variation  of  functions  in  £?(/?),  one  can  get  a  bound  on 
how  well  (Xn&n)  approximates  elements  in  Sl(£).  Similar 
arguments  are  made  about  (J?JTn)*Bv  This  lemma  is  the 
key  to  obtaining  approximations  that  are  independent  of 
the  controllers,  since  the  behavior  of  the  signals  in  the 
input  and  output  spaces  is  governed  by  &  and  Bv  the 
nature  of  the  approximation  depends  on  these  two  opera¬ 
tors  and  not  C.  The  rate  of  convergence  of  the  approxi¬ 
mations  is  determined  by  the  constants  Kg,  Kg,  which  are 
completely  determined  by  the  operators  B  and  <9,  respec¬ 
tively,  which  in  turn,  are  completely  determined  by  the 
original  plant. 

Lemma  4:  Assume  n  >  2nx,  and  r/n  is  not  a  patholog¬ 
ical  sampling  period,  then 

a)  3  an  operator  &n,rXL-  [0.T] 

such  that  C?^)~H =  identity, 

LH0,t]  LHO.r]  Ll[0,r] 

u  W.)~L  U  u 

and  a  constant  Kg,  such  that 

lip  -  wk)"‘)Uk-s,>II  *  v- 

b)  3  an  operator  C^^)-L:  £"10,  r ]  such 

that  =  identity, 

L"[0,t]  Lx[0,  t]  L“[0,t] 

u  vr<r.rL  U  (V.)  U 

&{<•)  *- 

and  a  constant  Kg  such  that 

|j(/  -  C*|^)  L%(^yJ)\\  ^  —  • 


The  proofs  of  this  lemma  and  the  next  one  are  quite 
technical  and  involved,  and  thus  are  relegated  to  the 
appendix. 

The  next  lemma  takes  care  of  approximating  the  direct 
feed-through  operator  Du,  which  is_approximated  by  the 
direct  feed-through  operator  Dn  of  G„. 

Lemma  5:  There  is  a  constant  K$  such  that 

Kd 

\\Dn  -  Dn\\  < -f . 

Combining  Proposition  3  and  Lemma  5,  we  get  that  ?„ 
approximates  T  by 

K0  _  K6 

\\T-Tn\\<—  IITJI  +  —  -  (13) 

To  get  a  bound  with  ||?J  on  the  right,  note  that  Tn  = 
f  +n  which  implies  by  the  triangle  inequality  that 

x  on  n9  —  r 

IITJI  -  \\Dn\\  <  IITJI  and 


iitji  <  iidj  +  iit„ii  <  nAiii  +  W- 

Since  Wn\\  is  a  constant,  combining  with  (13)  yields 

iir-r„n<—  iiA.ii  +  +  v- 

n  n  n 


Let  us  go  back  to  the  formulation  of  the  problem 
involving  the  infinite-dimensional  generalized  plant  G, 
and  consider  the  decomposition  of  G  in  feedback  with  the 
controller  C  (Fig.  6). 

To  facilitate  the  geometric  arguments  we  are  about  to 
make,  we  assume  that  the  operator  Dn  =  0.  Note  that 
this  assumption  is  valid  only  when  Gn  =  0,  and  this  is  an 
unrealistic  assumption  for  most  interesting  control  prob¬ 
lems,  but  the  assumption  is  made  for  the  purpose  of 
illustration.  With  the  assumption  Dn  =  0,  the  decom¬ 
posed  system  in  feedback  with  C  is  shown  in  Fig.  8, 

where  3  ~  [Cx  Dl2\- 

We  first  look  at  possible  decompositions  of  the  output 
space  Li 0,  r].  From  Fig.  8,  it  is  clear  that 

3(G,C)  =  3?{G00,C)Bx 

which  means  that  the  output  signal  £  takes  values 
in  31(3)  c  L10,  t]  (at  each  point  in  time).  Since  3\ 
p,+».  _*l10,t],  then  31(3)  is  a  finite-dimensional 
subspace  of  L10,  r],  and  there  exists  a  projection  on  it 
IWy  L10,  t]  -*31(3)  [20].  By  the  definition  of  a  projec¬ 
tion,  we  have  that  for  any  x  e  Rn‘+n“,  W3x\\c\t>,r)  = 
W>.  therefore 

\\nmsMc.'-c)e>W " 


Finally,  since  ||T||  -  l|T„||  <  IIT  -  Tj,  we  get 


and  thus  we  have  arrived  at  the  main  inequality  (5). 


VI.  Geometrical  Interpretations 

In  the  previous  section  we  gave  an  approximation  pro¬ 
cedure  to  obtain  approximately  optimal  controllers.  The 
procedure  is  based  on  forming  an  approximate  finite¬ 
dimensional  system  to  an  infinite-dimensional  one. 
A  question  may  be  asked  as  to  whether  the  infinite¬ 
dimensional  problem  may  be  exactly  reducible  to  a 
finite-dimensional  / 1  problem.  For  example,  in  [1],  die 
sampled-data  problem  was  treated  by  the  lifting 
technique,  and  an  exact  reduction  of  the  resulting 
infinite-dimensional  problem  to  a  finite-dimensional  one 
is  possible.  This  motivates  the  question  as  to  whether  a 
similar  exact  reduction  is  possible  in  the  Z1  problem. 

In  this  section,  we  will  not  give  a  definite  answer  to  this 
question,  but  it  is  our  purpose  to  illustrate  some  of  the 
underlying  geometry  in  the  reduction,  and  to  suggest  that 
the  ll  sampled-data  problem  may  not  be  exactly  reducible 
to  a  finite-dimensional  /'  problem.  We  will  give  a  geomet¬ 
ric  reasoning  which  shows  that  the  fundamental  differ¬ 
ence  between  the  reduction  of  the  JT  and  the  /’ 
sampled-data  problems  has  to  do  with  the  difference 
between  the  geometry  of  finite-dimensional  Hilbert  and 
Banach  spaces. 


Note  that  Hm^33(G00,  C)B,  is  a  system  with  a  finite¬ 
dimensional  output  space,  namely  31(3),  and  the  norm  on 
3&3)  is  the  norm  it  inherits  as  a  subspace  of  L10,  r]. 

A  gintilar  reduction  is  possible  with  the  input  space,  for 
this,  we  need  to  look  at  the  preadjoint  operators.  Since  for 
any  Banach  space  operator  A,  ||/4||  =  II A*  11,  we  have  that 

||IWM<L.C)£,ll  ,  11*5,  *^G.„C)  V-IW.II 

and  as  before,  we  can  project  on  3l(*Bl)  c^'[0,  t]  with¬ 
out  changing  the  induced  norm 

=  ||  g00,c)  ^‘iWjll 

=  ||n^-)^(Goe,c)B1nVil)ll 

where  the  last  equality  follows  by  taking  the  adjoints. 
Also,  note  that  since  n^.^:  L'[0,t] -»^(*Bi)  then 
(32(*BX)Y  -»  L10,t],  where  (3?(*B,))*  is  the 
dual  space  of  3l(*B ,),  and  it  is  finite-dimensional  since 
3l(*B  )  is. 

Combining  the  reduction  on  both  the  input  and  the 
output  spaces,  we  have 

||^G00,c)i1||  =  !|n^)^G00,c)B1n^l)|| 

=:||^(G,C)||,  (14) 

where  G  is  defined  by 

°“[  0  /][  o  /]• 


Fig.  8.  Decomposition  of  G  with  Dn  =  0. 


Equation  (14)  shows  that  the  original  problem  is  reducibte 
to  the  standard  problem  with  the  generalized  plant  G. 
Since  G  has  finite-dimensional  input  and  output  spaces 
(since  miff)  and  im^Bj)*  are  finite  dimensional), 
we  have  arrived  at  an  equivalent  finite-dimensional  prob¬ 
lem.  This  problem  is  not  necessarily  a  standard  finite¬ 
dimensional  /'  problem,  it  is  only  so  if  the  input  and 
output  spaces  imiff)  and  imi*Bx)Y)  are  linearly 
isometrically  isomorphic  to  an  Fin)  space  for  some  n. 

Remark  In  the  %Y  sampled-data  problem,  the  situation 
is  much  simpler.  In  that  case,  miff)  and  imi*Bx)Y  as 
subspaces  of  L2[0,  t],  are  immediately  linearly  isometric 
to  Euclidean  spaces  (that  is  l2in)\  since  every  finite¬ 
dimensional  Hilbert  space  is  linearly  isometric  to  a 
Euclidean  space  of  equal  dimension. 

Thus,  the  question  arises  as  to  what  the  spaces  miff) 
and  imi*Bx)Y  look  like,  and  to  whether  they  are  isomet¬ 
ric  to  /"(«)?  If  the  answer  is  affirmative,  we  can  use  this 
identification  with  Fin)  and  obtain  a  generalized  plant 
which  has  an  Fin)  for  each  of  its  input  and  output  spaces, 
and  the  problem  then  becomes  a  standard  F  problem. 
However,  the  answer  is  negative.  This  can  be  seen  by  a 
simple  example,  where  we  plot  the  unit  ball  of  the  space 
and  show  that  there  is  no  linear  transformation  that 
can  transform  it  to  a  unit  ball  of  an  Fin)  space. 

The  example  we  consider  is  as  follows:  first  recall  that 
the  operator  ff  is  given  by  the  following  kernel  function 

<*f) -  [<?,(*)  £«(*)]-  cya‘  c(/ysds)B2- 

We  will  consider  the  subspace  miCx)  cmiff)  and  show 
that  it  cannot  be  a  subspace  of  any  Fin).  Recall  that  the 
norm  on  the  space  miCx)  is  the  norm  inherited  as  a 
subspace  of  L“[ 0,  r].  The  unit  ball  in  miCx)  can  be 
plotted  by  choosing  a  basis,  and  then  computing  the 
LlO,  t]  norm  for  combinations  of  the  basis  elements.  The 
particular  example  we  pick  is 

A  =  \°1  ~1  ;  C  =  [1  1/21 

with  r  =  1.  For  this  example  miCx)  has  dimension  two, 
and  a  basis  for  it  is  given  by 

xxit)  :=  CxeAt  J  ;  x2{t)  :=  CxeA,\^  . 

To  plot  the  unit  ball  in  miCx),  we  represent  any  x  e 
miCx)  by  x  =  ctjXj  +  a2x2.  The  ball  in  Fig.  9  represents 


Fig.  9.  The  unit  ball  of  ^(C]). 


Hxll  =  1,  and  the  axes  are  ax  and  a2.  The  unit  ball  in 
an  Fin)  space  is  an  n-cube,  and  the  unit  ball  of  any 
2-dimensional  subspace  of  Fin )  is  a  2-dimensional  slice 
through  an  n-cube,  and  it  is  clear  that  the  boundary  of 
this  2-dimensional  cube  must  be  made  up  of  straight  lines, 
i.e.,  it  must  be  a  polygon.  Now,  for  miCx)  to  be  linearly 
isometric  to  a  subspace  of  Fin),  a  necessary  condition  is 
that  its  unit  ball  [that  of  miCx)]  must  be  linearly  trans¬ 
formable  to  a  polygon,  which  means  that  it  should  itself 
be  a  polygon.  Since  the  unit  ball  of  the  particular  example 
in  Fig.  9  is  not  a  polygon,  we  conclude  that  miCx) 
[and  consequently  miff)]  is  not  linearly  isometrically 
isomorphic  to  an  Fin)  space  for  any  n. 

We  end  this  section  with  a  geometric  interpretation 
of  the  approximation  procedure  given  previously.  If  we 
apply  the  approximation  procedure  to  the  system  in 
Fig.  8,  the  result  is  the  system 

Srn09{Goo,C)Bfrn.  (15) 

Looking  only  at  the  output  side  (the  input  side  can  be 
interpreted  similarly  using  adjoints),  the  norm  on  the 
output  side  is  essentially  measured  by  sampling  the  ele¬ 
ments  in  Miff),  that  is,  the  norm  of  a  function  /  <=miff) 
is  computed  by  taking  the  Fin)  norm  of  n  samples.  As 
before,  we  can  plot  the  unit  ball  of  miCx)  in  this  new 
norm  which  we  wall  call  the  “samples  norm.”  (Actually,  we 
will  plot  the  coefficients  av  a2,  hence  the  plot  is  two 
dimensional).  This  norm  approximates  the  actual  norm  on 
StiC,)  for  large  n.  This  approximation  can  be  seen  in  Fig. 
10  (for  n  =  3),  where  the  samples  norm  unit  ball  is 
superimposed  over  the  actual  unit  ball  of  miC j).  It  is 
interesting  to  see  that  what  is  being  done,  is  approxima¬ 
tion  of  the  unit  ball  of  miff)  by  polygons.  Thus  the 
approximation  procedure  for  solving  the  sampled-data 
problem  can  be  interpreted  as  an  approximation  of  norms 
of  the  input  and  output  spaces.  It  is  interesting  to  note 
here  that  the  unit  balls  of  miff)  and  imi*Bx)Y,  generally 
represent  nonlinear  constraints,  very  much  as  in  the  con¬ 
tinuous-time  L1  problem  [6],  while  in  discrete-time  / 
problems,  the  constraints  are  always  linear.  Therefore,  the 
fact  that  the  norms  in  the  sampled-data  problem  repre¬ 
sent  nonlinear  constraints  (roughly  speaking),  seems  to  be 
a  consequence  of  the  continuous-time  nature  of  the  prob¬ 
lem  (just  as  in  the  L1  problem).  However,  by  essentially 


Fig.  10.  The  unit  balls  of  &(CX)  with  the  actual,  and  the  samples 

norms. 

approximating  the  nonlinear  constraints  by  linear  ones, 
we  are  able  to  reduce  the  problem  to  a  standard  discrete- 
time  /'  problem. 

Finally,  we  point  out  that  the  mathematical  reason 
behind  the  difference  in  the  reductions  of  the  J?°  and  ll 
sampled-data  problems,  is  that  in  the  former,  any  finite¬ 
dimensional  Hilbert  space  is  linearly  isometric  to  l2(n), 
while  in  the  latter,  not  every  finite-dimensional  Banach 
space  is  linearly  isometric  to  Fin).  This  reflects  the  fact 
that  the  isometric  class  of  Banach  spaces  of  dimension  n 
is  a  much  richer  class  (there  is  an  infinite  number  of 
them,  for  example  lp(n)  for  1  <  p  <  “),  than  the  class  of 
Hilbert  spaces  of  dimension  n  [of  which  there  is  only  one, 
Hn)l 

VII.  Conclusions 

This  paper  provides  a  solution  for  the  sampled-data  l1 
problem  through  approximation.  Utilizing  lifting  tech¬ 
niques,  the  input / output  map  is  decomposed  in  such  a 
way  that  the  infinite-dimensional  part  of  the  system  is 
isolated  independently  of  the  controller.  This  part  is  then 
approximated  in  a  precise  way  by  a  finite-dimensional 
system,  whose  dimension  can  be  determined  given  any 
degree  of  accuracy.  Computable  bounds  on  the  norm  of 
the  difference  of  the  actual  system  and  the  approximated 
system  are  furnished,  and  they  all  depend  entirely  on  the 
system’s  data  It  is  shown  that  the  rate  of  convergence  of 
this  approximation  is  (1/n). 

It  is  interesting  to  note  that  the  same  approach  and 
approximation  arguments  in  this  paper  can  be  followed  to 
obtain  bounds  like  the  main  inequality  for  the  L1 -induced 
norm  sampled-data  problem.  A  combination  of  this  with 
the  Riesz-Thorin  convexity  theorem  would  then  show 
that  the  main  inequality  (with  different  constants)  holds 
for  general  ZAinduced  norm  problems.  In  particular  this 
holds  for  the  L2-induced  norm  case.  In  this  case,  this 
approximation  procedure  was  shown  to  converge  in  [15]. 
The  results  of  this  paper  and  the  above  convexity  argu¬ 
ment  indicate  that  stronger  convergence  at  the  (1/n)  rate 
anally  holds.  However,  for  the  case  of  the  L2-induced 
norm  sampled-data  problem,  an  exact  equivalence  to  a 
discrete-time  problem  can  be  obtained  [1].  It  is  indicated 
in  this  paper  by  geometric  arguments  that  this  exact 


correspondence  may  not  be  possible  in  general  for 
L“-induced  norm  sampled-data  problems. 

The  approach  followed  in  this  paper  is  readily  applica¬ 
ble  to  the  structured  perturbations  problem  for  sampled- 
data  systems  [16].  The  minimization  problem  in  this  set-up 
involves  spectral  radius  functions,  and  a  similar  result 
follows  from  the  continuity  of  the  spectral  radius  function. 
The  derivation  of  explicit  bounds  takes  more  work  and 
will  be  reported  elsewhere. 

Appendix  A 

In  the  following  proofs  it  is  assumed  for  simplicity 
that  the  matrices  Dn  and  Z)12  are  zero.  If  Dn  is  not 
zero,  the  statement  of  Lemma  4  still  holds.  If  Dn  is 
not  zero,  the  statement  of  Lemma  5  does  not  hold,  how¬ 
ever  the  main  inequality  does  hold  but  has  to  be  derived 
differently. 

Proof  of  Lemma  4 

a)  If  /  then  fit)  =  *Bft)x  =  B'X^x,  for 

some  x  e  R"*.  We  may  assume  without  loss  of  generality 
that  iA,Bf)  is  controllable,  since  if  not,  we  can  decom¬ 
pose  the  state  space  into  the  controllable  and  uncontrol¬ 
lable  subspaces,  and  write 

*Bxit)  =  [B’c  0)e[tcA',](T-,)T, 

where  (Ac,  Bc)  is  controllable,  T  is  nonsingular,  and  then 
note  that  y  is  the  same  as  the  range  of  (B'c  c  T  0, 
and  thus  work 'with  ( AC,BC )  instead  of  (A,Bf).  We  also 
note  that  since  the  eigenvalues  of  Ac  are  a  subset  of  the 
eigenvalues  of  A,  then  if  r/n  is  nonpathological  for  A,  it 
is  nonpathological  for  Ac. 

Now,  to  show  that  C^)  has  a  left  inverse,  we  need  to 
show  that  i^X)-  Xbo  Ll[0’T]  is  injective,  but  since 
^  fin)  ->  L'[0,  t]  is  injective,  it  suffices  to  show  that  !7n\ 
&  - )  ->  l\n)  is  injective,  or  equivalently,  that  it  has  no 
null*  space.  Given  let  since  fit)  = 

B[eA'(T~'h t  for  some  *  e  R"*,  then 

/"  «  "  fa+1>T/nB'1eA'(T~,)xdt 

‘  T  •'ir/n 

_  fT/ng-AXT/n~‘)  dteA  ^n~‘~  ^/nx 

T 

=  -B\^'iT/n)eA'(n~i~lyt/nx, 

T 

or  in  matrix  notation 

’/o'l  I"  B\W{T/n)eA(n~l)T/n~\ 

:  =  -  :  x~-s§'nx.  (16) 

/,'-.]  L  B^:{T/n)  J 

Note  that  for  n  >  nx,  @n  contains  the  controllability 
matrix  of  ieAr/n,ViT/n)Bf),  and  since  (A,  Bf)  is  control¬ 
lable  and  7 /n  is  a  nonpathological  sampling  period,  then 
ieAT/n,V{T/n)Bf)  is  controllable,  and  thus  the  matrix  3'n 
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has  full  rank.  Therefore,  if  /  /  *  0,_  then  /  = 

•B,*,  for  some  a:  e  Rn%  x#0,  consequently  /  #  0  (since 
^  has  full  rank),  implying  that  ^  has  no  null  space  and 
thus  is  injective. 

To  obtain  the  bounds  we  need,  it  is  necessary  to  bound 
the  norm  of  a:  that  solves  the  equation  f  =  @'nx  by  the 
norm  of  /.  Since  &n  has  full  rank  (as  a  matrix),  there 
exists  a  constant  c,  such  that  if  /=  in/r)^nx  then 
(«/t)||x||i  <  c1||/||/i(„)  (where  flxll,  is  the  1-norm  on  RH 
The  constant  c,  can  be  taken  as  the  norm  of  the  left 
inverse  to  38\.  See  the  appendix  for  the  proof  that  c,  is 
independent  of  n. 

If  we  define  f~Kf>  then  we  have  from  the  definition 
of  S?n\  Hn)  -»  L^o,  t]  that  II/IIl1[o.t]  =  (T/n)ll/ll/>(»). 
Combining  this  with  the  previous  bound  yields  that  for 

IIxHi  ^  C1B/llt«t0.r]* 

Now,  to  compute  a  bound  on  \V  - 

let  /  be  an  element  in  i.e.,  f  —  3'Jt^Blx  for 

some  i  e  R"‘.  We  have  already  shown  the  existenceof 
the  left  inverse  C^)_L,  by  its  definition  U^)~L/  = 
*B,x,  therefore 

||(f  _  )^Hl'[0, r] 

=  WTf?BxX  -  •ijXllt'p.rJ 


-  jflKW^i *)(<)  -  dt 

=  "l  /U+1>r/n||(«’S1x)(f)  -  {•B1x)(t)\\  dt 
i- 1 

=  V 1  /■(,+1)r/n||!i/  f(i  +  1>T/n*B1(s)  ds)x 

|  =  1  h-r/n  Tyir/n  I 

-*Bxit)x\\dt 

<  nY  ||-(  fa+1>T/n*Bx(,s)ds) 

i-l  JiT/n  T  yir/n  I 

-*5,0)11  drllxll, 


n  — 1  jt 


<  Z  ^  sup  ll“5r^lwl1 

j-1  n  OSfST  m 


<  frlWI.  t  sup  II 

2 n  ,_x  osist 


<  _Cl|j/||n||B'1||  \\A’\\e^T 


2  n2 


<  yCjHFjll  IU'l|eM',THl/ll 


see  (18)  which  means  that 

||(f  “  ift&O 


r2  ,  1  Ki 

<  —c.WB'MA'We^-  =■■  —  • 


Proof  of  b):  By  definition,  S  ■=  [C,  Dn\ 


(fit)  =  [Cj(f)  Dnit)}  = 
=  [0  C 


Cxe 


At 


[;  “1. 

0 

b2 

[/  A  J 

./ 

0  . 

where  the  last  equality  is  a  consequence  of  the  formula 

/o  eAs ds  =  [0  I]e[i  ^]'  'j. 
that  in  the  proof  of  part  a), 

®  j  by  Cg  and  A0  such  that  (C0,  Aa)  is  observable,  i.e„ 


With  an  argument  similar  to 
we  can  replace  [0  CJ  and 


[0  C 


fo  ol 
a  ‘ 


B2 

0 


-[C0  0]e 
=  [C0eA°‘  0] 


l 7  A,0\  ^[/  o 


Rr 

Ri 


-  C0eA'%, 


where 


*'l  ==  r 
"2 


0  Bj| 
/  0 


j*l.  Furthermore,  we  can  replace  Rx 

by  5,, 1  which  is' made  up  of  the  linearly  independent 
columns  of  Rx,  and  define  <f0it)  -  C0eA’’Bf,  we  then 
have 

•»(«>) C,]e["  JJ 

-xdcy-'B,})  -*(<*.). 

Now,  to  show  the  existence  of  C^^)-L  on  3Z(r^jy  or 
equivalently,  that  &„&„)  is  injective,  it  suffices  to  show 
that  yn  has  no  null  space  in  (since  Yin)  -* 
L”[0,  t]  is  injective).  By  the  representation  above,  if  /e 
f+  0,  then  fit)  =  C0eA°'BjX  for  some  *  *  0,  x  e 
R*  (where  p  <  nz  +  nu).  Let  f  ■—  ^/,  then  f — 
C0eA‘”/nBfx,  or  in  matrix  notation 


Bfx  «  9nBfx. 


Since  iC0,  A0)  is  observable  and  r/n  is  not  pathological, 
then  iC0,  eA°T/n)  is  observable  implying  that  the  matrix 
<%>n  has  foil  column  rank  (for  n  >  2  nz),  and  since  Bf  also 
has  column  rank,  then  /  #  0,  which  shows  that  has  no 
null  space  in  Sl^Y 

To  obtain  the  bounds  we  need,  it  is  necessary  to  have  a 
bound  on  the  norm  ||x|U  (II  *  II-  is  the  maximum  compo¬ 
nent  norm  in  R^)  of  solutions  of  the  equation  /  =  ^„Bfx. 
Since  both  and  B,  have  full  column  rank,  they  both 

have  left  inverses  BJL,  and 

llxlU  <HBfill  II^TlI!  Il/llrw- 


fo 

c„ 

fn-y 

C0ieA'T/n)n~l 

(17) 


Since  /"(n)  -»L"10,t]  preserves  norms,  that  is,  for 
f  :=%lf  =  (J^Styf,  we  have  that  =  ll/llroo*  the 

above  bound  becomes 

IUIU  <  c2||/IIlio,t]- 

The  proof  that  the  bound  c2  is  independent  of  n,  though 
long  is  entirely  similar  to  that  for  c,  in  part  a). 

Now  let  feAjy,  therefore  /  f  which  means  that 
f  =  (f  x  for  some  x  s  Rp.  Let  /  :=  by  the  defi¬ 

nition  of  c k?xl>  we  have  that  (x.KyLf=f=(f°x- 
We  now  compute, 


ll(/  -  y„)  )/IIltp,tj 


=  sup  \\(J?nyJ0x)(t)  -  {<?0x)(t)\\ 

Osrsr 

=  sup  sup  \\(^J0x)(i  +  n/n) 

0Si£n-l  0S(Sf/ii 

-(<?0x)(t  +  ir/n)\\ 

=  sup  sup  ||(^0J:)(iT/n) 

0£i£n-l  0 <.\<.r/n 

-{<?0x)Ct  +  ir/n)\\ 

<  sup  sup  ||  f,+,T/n_l(,j)<2j||  IUH, 

OsiSn-1  OsfST/n  ,T/n 

A 

<  sup  sup  J  7  ||-— (s)||  dslUllo 

0£i£n-l  Ostsr/n  ,T/n 

A 

<  SUP 

Osisn-l  ■,«V"  “ 

<  sup  sup  ||— 5 — ll-«IU 

Osisn-l  05IST  “  71 

<  sup  sup  l|C0IIIU0||e»^llM!B/IHUII 

Osisn-l  Ostsr 

<  IICJI  IU0||eM-’l|Tl|B/l|-c2||/|lno,T], 


which  results  in 


I  -  («)' 


from  the  fact  that  if  each  entry  in  the  matrix  of  norms 
tends  to  0  separately,  then  the  maximum  row  sum  will  also 
tend  to  zero. 

The  IT 10,  r]-induced  norm  of  an  operator  si  given  by  a 
kernel  function  s/it,  s)  is 

\\sf\\  =  sup  f  \si(t,s)\ds. 

0£.t£r  J0 

The  kernel  function  of  Dn  is  given  from  (1)  by 
DnO,s)  = 

The  operator  Dn  ==  has  a  kernel  func¬ 

tion  which  is  piecewise  constant  over  squares  of  width  r/n 
in  [0,  t]  X  [0, r],  in  particular,  for  t  =  t  +  iv/n  and  s  = 
s  +  jr/n,  t,  s  e  [0,  -r/n] 

Dn(t,  s)  =  -C1^'>/B(/0'+1>T/"e-Mrdr]l(i.;..1)51, 

T  \  jr/n  I 

where  1(0  is  the  unit  step  function  with  a  discrete  parame¬ 
ter.  We  now  compute 

\\Dn  -  D„ II 

=  sup  n^uO,*)  -Dn(t,s)\ds 

OilSr  ■'0 


=  sup  sup  Y  /  ; 

Osisn-l  O&tsr/n  j=0  fr/n 

•|Dn(r,s)  -D„{t,s)\ds 
rO  +  Dr/n 

-  sup  sup  L  /  . 

i  i  0  ^T/n) 


■Q  +  l)r/n 


■'/(r/n)  ' 


1  7C„- 

<  ||CJ  IU0||eM<’l|T||B/l|c2T—  ==— • 


. . .  1  *  n  n 

Proof  of  Lemma  5 

If  l5n  comes  from  the  lifting  of  a  MIMO  Gu,  then  Dn 
operates  on  vector  signals,  i.e.,  Dn:  L°°„[ 0,r]  -*  ITm  [0,r]. 
The  induced  norm  of  such  an  operator  is  bounded  above 
bv  the  maximum  row  sum  of  the  matrix  of  the  L"[0,  t  ]- 
induced  norms  of  the  SISO  subsystems.  We  will  prove  the 
lemma  as  if  Dn  is  scalar,  the  MIMO  statement  follows 


<  ncjii  iisji  sup  e||/4||i(T/n)  sup  l  /;;; 

1  i  i  j- o '/<*/»> 

-  7 

•'/(T/n) 

<  lie,  IIBjIle11^  sup  sup 


r(/+1Vn||g^(t-5)  _  !L 
j.o  T 


drjj  ds 


f(j+  Ijr/n  Ar 
l(r/n) 


+  jT/n\\eA°-s)\\ds\, 
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where  the  last  term  represents  the  case  i  =  j.  From  (20)  in  of  the  norms.  Note  that  in  the  following  bounds,  II  •  II  is 

~  .  •  1  1  .1  .  .1  _ _ —  In  on 


Appendix  B,  we  can  bound 


any  matrix  norm  provided  that  the  same  norm  is  used  on 
both  sides  of  the  same  inequality. 


f(j+^/rl^gACi-s)  _ 

-f(J~lh/ne-ArdrHds 

f1lr“ — ( r^5)^) 

r  Jj(r /n) 

Janb  -  a  \Ja  } 

<  _|j eA(,i-Kr/n))  _  £-Aj(T/n) 

~  n 


( b  ~  a) 


1 

+  2 


sup  I  lAeA(i~s) 
;(r//i)if£(y+l>T/n 


dF 


sup  ||— (f)|||  (18) 


aztzb 


'  T2 


+  sup  II  Ae  Ar\ 

;(r//i)£r£(/+  lh/n 


<  le^ixr/o) \\eAi  _  /II  +  \\Ay\A\\T* 

n  n 


<  emr\  l(emi  _  1}  +  IIAIK). 
n  n 


Substituting  back  yields 
II D„  -  DJI  <  IIC, II  IIB1lk2MI|r  sup  sup 


l  t 

•{£(^-i)+w£i+;; 

<  IIC] II  IIBjIe211^ 


~(e"A"T  -  1)  +  Mil—  +  - 
n  n  n 


|]  ^F(t)F’U)  dt  -  j  |  f*F{s)  &  j  |  jTJ’ir)  drj  || 
S2r’(.^r11?11) 

<,\b  -  a\  iFJa)  -  F2(fl)l 
+  t(  sup  ||^(0|| 

2\a&tzb  “ 

+  sup  ||^(0||) 

aztsb  M  I 
•  \b  —  a\z . 


(19) 


(20) 


Appendix  B 


Completion  of  Proof  of  Lemma  4-a) 

Claim:  c,  is  independent  of  n. 

Proof:  We  will  construct  c,  as  an  upper  bound  on  the 
norm  of  the  left  inverse  to  Stn.  This  is  done  by  taking  the 
pseudo-inverse  as  a  left  inverse  to  &'n,  and  finding  a 
bound  on  its  norm  that  is  independent  of  n.  The  pseudo- 
inverse  to  grn  is  (anO'n)-1&n*  and  note  that  the  inverse 
exists  since  SS'n  has  full  column  rank.  We  first  bound 
IK^B^)-1ll.  From  the  definition  of  we  have 


Integral  Inequalities 

Let  F(t\FftXF2(t)  be  differentiable  matrix  valued  B'V'(r/n)e 

functions.  Some  useful  bounds  shown  below  can  be  estab-  2-  e  '  '  11  ' 

lished  by  using  the  formula 


A’iT/n 


rt  dF 

Fit)  =  Fia)  +  f  — (s)  ds, 

•'a  at 

and  some  manipulations  involving  cancelling  common  fac¬ 
tors  and  bounding  the  norm  of  an  integral  by  the  integral 


Denote  the  controllability  Grammian  over  the  finite  time 
r,  by 


WT-.=  feA,B1B\eA't  dt. 


-BAMIEH  «  aL:  MINIMIZATION  OF  THE  ^-INDUCED  NORM  FOR  SAMPLED-DATA  SYSTEMS 


7j  — *  oo 

We  will  first  show  that  *K 

\\wT- 

i 

=  ||  fTeA,B1B'1eA  ‘ dt 

Jo 

_  1  e‘WV(T/n)BlB[V'(T/n)eA’iT/n\l 
7  i  =  0 


for  n2  >  2Mx\\W;l\\.  To  take  care  of  the  case  of  n 
such  that  n-  <  2M,]\W7X\\,  note  that  is  only  a  finite  num¬ 
ber  of  such  ns,  and  let  M3  be  the  maximum  of 
|K(n/T)^^)_1!l  over  this  finite  set  of  n's  (note  also  that 
1K^>  exists  if  n  >  nx  and  r/n  is  not  a  pathological 

sampling  period).  Letting  M4  ==  max{M2,  M}),  we  obtain 


=  |fE  fiJrXyr/neA,BxB\eA  t  dt 

i=o  ir/n 


_  ”  e^/^(r/n)BlB\^'(T/n)eA'iT/n || 

7  i= o 


_  !LeAiT  ^  n<V{r/n)  B^B'^'  (T/n)eAir  /  n\\ 
T 


Vn  >  nx  such  that  r/n  is  not  pathological.  Finally,  to  find 
IK^  note  that  this  is  the  induced  norm  from 

l\n)  to  R"1  with  the  I!  •  1U  norm,  i.e.,  it  is  the  maximum 
column  sum  norm  on  the  matrix,  therefore 

•|| nr/n)[(eA^n)'"1Bl  -  B^\\ 
<||(^n^)"1]|||^(T/n)||max 

*{l|(e^T/'‘)/’_  1 1|,***,  !le^T/“ll}llB1ll 
<  -Mt-eMr/n  eMI|TIIB1ll 


<  "£ ew/n2r— 

.  n  " 

i-0 


(21) 


where  the  last  step  is  a  consequence  of  formula  (19). 
After  bounding  ez:Am/n  <  e2|1/1||T  and  summing  to  yield  a 
factor  of  n,  (21)  becomes 


K  <  2  e4,',#TllB1ll2lMII2-^2  - 

n  — *  oo 

where  Ml  is  a  constant.  Now,  since  (n/rX^X)  Wr> 
it  follows  that  ((n/T)^')-1"-"^1  U*  theorem 
10.12].  An  explicit  bound  (for  large  n)  on  the  norm  of 
in  terms  of  the  norm  of  W~  can  be 
constructed  "in  several  ways,  one  way  is  by  [18,  theorem 
10.11] 


<  M4  e2M|Tll£1ll  =:  clt 

since  Wr/»)||  «  li/oT/"  ^  M  <  /0T/n  e^  ds  ^J r0/n 
eiA\\r/n  <  (T/n)eiMT/n.  This  yields  the  desired  bound 

c,  which  is  independent  of  n.  n 

Existence  of  Preadjoints 

Given  an  operator  H:  Z*  -  X*,  where  Z*  is  the  dual 
of  some  Banach  space  Z,  its  preadjoint  *H  is  such  that 
*Ht  X  -» X  and  l*H)*  =  H.  Not  every  operator  has  a 
preadjoint,  but  the  operators  that  we  are  dealing  with  do. 
For  example,  Bx:  L”[0,  r]  ->  R"'  has  a  preadjoint  B{. 

->  Ll[ 0,  t].  Let  Bfs)  denote  the  matrix  valued  kernel 
function  representing  the  operator  Bv  it  is  very  easy  to 
check  that  the  operator  from  R"1  to  L[ 0,  r]  given  by  the 
matrix  valued  kernel  function  B'ft)  (here  '  denotes  matrix 
transpose)  is  a  preadjoint  to  the  operator  B:. 
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In  the  past  decade,  a  powerful  theory  for  designing  robust  control  systems  has 
emerged^  Starting  with  a  model,  and  a  description  of  the  uncertainty  (structure  , 
parametric  etc.),  a  controller  can  be  designed  to  meet  a  variety  ^Performance 
specifications.  This  development,  however,  has  not  been  accompanied  by  parallel 
development  in  system  identification  methods  by  which  a  plant  model  and  a 
description  of  uncertainty  is  provided.  In  an  attempt  to  bridge  this  gap,  a  new 
area  of  research  in  robust  identification  has  emerged  in  the  last  few  years. 
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